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SYSTEMATIC DISCOVERY OF NEW GENES AND 
GENES DISCOVERED THEREBY 

5 INVENTORS: Qiandong Zeng, Marco M. Kessler, and Guillaume Cottarel 

APPENDIX: Sequence Listing is submitted on CD-ROM and is herein 
incorporated by reference in its entirety. 

10 CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims priority under 35 U.S.C. § 1 19 to U.S. 
Provisional Application Nos. 60/271,406 entitled "Systematic Discovery of 
New Genes" filed February 27, 2001 and 60/333,726 entitled "Systematic 
Discovery of New Genes and Genes Discovered Thereby" and filed on 
15 November 29, 2001, the entire content of which are hereby incorporated by 
reference in their entirety. 

BACKGROUND OF THE INVENTION 

The genomes of organisms are large stretches of DNA. In many 

20 organisms, the function of a great part of the genome is unknown since it does 
not contain encoded genes. Because of advances in computerization, genomic 
sequences are being deposited in public databases at a dramatic rate. 
However, this information will be of little value to biologists if the tools to 
manage and interpret the information are not available and are not reliable. 

25 Today's scientists use advanced quantitative analysis and database 

comparisons to better manage the genetic information, and identify and define 
the relationship between sequences and the corresponding phenotypes. 
Increasingly, molecular genetics is shifting from the laboratory to the 
computer. However, the process of detecting genes in these sequences is still 

30 relatively slow. 

One promising use of bioinformatics to increase the efficiency of 
research involves studying a genome to determine the sequence and 
relationship to other sequences and genes in the genome in other organisms. 
This information is of significant interest to pharmaceutical and biomedical 
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research to, for example, assist in the evaluation of drug efficacy and 
resistance. Genetic databases for organisms such as Saccharomyces 
cerevisiae, Escherichia coli and Mycoplasma pneumoniae are publicly 
available, but the ability to manipulate this data is limited. To make the 

5 manipulation of genomic information easier, sophisticated databases and 
search programs have been developed. 

Some well-known databases of genetic information include 
GenBank™, SwissProt and OMIM™ (Online Mendelian Inheritance in Man). 
GenBank™ is the National Institutes of Health (NIH) genetic sequence 

10 database, an annotated collection of all publicly available DNA sequences 
(NucL Acids Res. (2000) 28:15-8). There are approximately 10,336,000,000 
bases in the 9,103,000 sequence records as of October 2000 (see 
www.ncbi.nlm.nih.gov/Genbank/). GenBank™ is part of the International 
Nucleotide Sequence Database Collaboration, which comprises the DNA 

15 DataBank of Japan (DDBJ), the European Molecular Biology Laboratory 
(EMBL), and GenBank™ at the NIH. 

SwissProt is an annotated protein sequence database established in 
1986 and maintained collaboratively by the Swiss Institute for Bioinformatics 
(SIB) and the European Bioinformatics Institute (EBI). 

20 OMIM™ is a database catalog (www.ncbi.nlm.nih.gov/OMIM/) of 

human genes and genetic disorders authored and edited by scientists at The 
Johns Hopkins University. The database contains textual information and 
references, as well as links to MEDLINE and sequence records. 

The Entrez retrieval system, run by the National Center for 

25 Biotechnology Information (NCBI) at the NIH, can search several linked 
databases at a time. Entrez can search biomedical literature databases, 
GenBank™, SwissProt and other protein databases, three-dimensional 
macromolecular structures and OMIM. Searches can produce results in the 
form of related sequences and structural neighbors. 

30 A popular search program algorithm is BLAST (Basic Local 

Alignment Search Tool). BLAST is a set of similarity search programs 
designed to explore all of the available sequence databases regardless of 
whether the query is protein or DNA. The BLAST programs have been 
designed for speed, with a minimal sacrifice of sensitivity to distant sequence 
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relationships. The scores assigned by a BLAST search have a well-defined 
statistical interpretation, making real matches easier to distinguish from 
random background hits. BLAST uses a heuristic algorithm which seeks local 
as opposed to global alignments and is therefore able to detect relationships 

5 among sequences which share only isolated regions of similarity (Altschul, 
S.F. et al. (1990) "Methods for assessing the statistical significance of 
molecular sequence features by using general scoring schemes/' Proc. Natl 
Acad. Sci. USA, 87: 2264-2268). 

Despite the strong computational biomolecular databases and search 

10 engines currently available, manual evaluation of the data produced is often 

required. Biological macromolecules exhibit many non-random features, most 
notably repetitive sequences and non-coding introns of genomic DNA. These 
typically require extensive evaluation of database matches that are found, 
which is a subjective, error-prone and tedious process. Present computational 

15 biology methods used to determine the number of coding sequences include 
promoter studies (Rainer, N. et al (1999) Yeast 15:1775), codon usage 
(Staden, R. and McLachlan, A.D. (1982) Nucl. Acids Res. 10:141), or some 
combination of these methods. These procedures are based on current 
knowledge of gene function, and have a number of limitations. 

20 In addition, there is evidence that the current computational methods 

for assessing coding potential often fail to identify open reading frames 
(ORFs) that are discovered through experimental and other non-computational 
methods. While sequence similarity search programs are a quick and versatile 
tool, frequently able to identify putative coding regions, the accuracy of the 

25 present methods is often compromised by factors such as differential and 
tissue-specific splicing, genes within genes (i.e., polycistronic coding 
domains) and the need for species specific parameters. From a statistical 
standpoint, the accuracy of known methods is extremely dependent on the 
choice of scoring system, statistical significance of alignments, sequence 

30 redundancy and the masking of confounding sequence regions. 

For example, Serial Analysis of Gene Expression, or SAGE, is a 
technique designed to take advantage of high-throughput sequencing 
technology to obtain a profile of cellular gene expression. Essentially, the 
SAGE technique measures not the expression level of a gene, but quantifies a 
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"tag", which represents the transcription product of a gene. A SAGE tag is a 
nucleotide sequence of a defined length, directly 3'-adjacent to the S'-most 
restriction site for a particular restriction enzyme. The data product of the 
SAGE technique is a list of tags, with their corresponding count values and 

5 thus is a digital representation of cellular gene expression. However, the 

SAGE method often sacrifices accuracy and fidelity in both the assignment of 
tags to genes as well as the ability to quantify a gene's expression level in 
order to increase throughput. 

The need for an in silico (i.e., computational) method to identify new 

10 coding genes with the speed and versatility of the presently known methods, 
but with increased accuracy and lack of bias, is increasing exponentially in 
conjunction with the increasing accumulation of known sequences. 

In addition to accurate methods, it is also important to have a model 
that lends itself well to research. In attempts to sequence and annotate the 

15 human genome, scientists have turned to the genomes of other organisms to 
use as models. One genome of one organism often used is that of the 
single-cell eukaryote, Saccharomyces cerevisiae (baker's yeast). 
Saccharomyces is amenable to genetic and biochemical manipulations, and 
many processes that occur in yeast also occur in larger eukaryotes, making 

20 yeast a model system for the study of eukaryotes, including humans. The 
yeast model system Saccharomyces cerevisiae was the very first eukaryotic 
genome to be completely sequenced (Goffeau, A. et ah (1996) Science 
274:546) and is the subject of intensive research. The current consensus 
suggests the number of yeast genes, which are 100-amino acids or longer is in 

25 the range of 6000, (Goffeau (1996); Mewes, H.W. et ah (1997) Nature 

557(6632 Suppl):7 ; and Winzeler, E. A. and Davis, R.W. (1997) Curr. Opin. 
Genet. Dev. 7:771, excluding a subset of small ORFs (Basrai, M.A. et ah 
(1999) Moh Cell. Biol. 19:7041; and Velculescu, V. E. et ah (1997) Cell 
88:243). Recent genetic studies designed to catalog all genome transcripts, 

30 using SAGE technology (Velculescu, V. E. et ah (1997)) and the analysis of a 
collection of transposon insertions (Ross-Macdonald, P. et ah (1999) Nature 
402:413), have discovered new ORFs, which were not previously identified 
in silico. This pool of novel genes includes some putative proteins that are 
optimally shorter than 100 amino acids. However, determination of ORFs 
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encoding polypeptides greater than 100 amino acids are also contemplated 
using the methods described herein. 

SUMMARY OF THE INVENTION 

5 This invention relates to a systematic in silico method to identify new 

coding sequences, including homologs of coding sequences, in S. cerevisiae 
and other organisms. The method of the present invention compares ORFs of 
a first organism to a comprehensive database of sequences from related 
organisms to identify homologs. The results of this method using 

10 comprehensive database searches and experimental studies suggest that the 
number of coding genes in, for example, S. cerevisiae, is substantially higher 

than currently believed. 

Another embodiment of the present invention comprises a method 

comprising the following steps: 
15 (A) collecting genomic sequence of the first organism; 

(B) identifying stop-to-stop ORFs of the first organism; 

(C) translating the stop-to-stop ORFs into polypeptide sequences; 

(D) comparing the polypeptide sequences of the first organism to 
amino acid translations of genomic libraries comprising genomes of other 

20 organisms; and 

(E) identifying, based on sequence identity, ORFs of the first organism 
that are present in the other organisms, wherein the identified ORFs are coding 
ORFs. The ORFs are typically determined using the start codon AUG and 
stop codons UAA, UAG and UGA. However, the method also contemplates 

25 genome analysis with the less conventional start and stop codons discussed 
infra. 

In one embodiment, the method comprises using BLAST with a p- 
value of less than 1 . In another embodiment, FASTA is used, preferably with 
settings equivalent to those for BLAST with a p-value of less than 1 . 
30 In another embodiment, the invention comprises a method of 

identifying ORFs in a genome of a first organism comprising the steps of: (A) 
collecting genomic sequence of the first organism; (B) comparing the genomic 
sequence of the first organism to one or more other genomic libraries 
comprising genomes of other organisms containing ORFs; and (C) 
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determining ORFs for the first organism based on the comparison. The ORFs 
or step B are ORFs that have been previously been described. 

The nucleic acid and amino acid sequences of the organism being 
studied may have at least about 20%, more preferably 25%, and more 
5 preferably at least 30% sequence identity to known sequences. 

The algorithm used would provide results equivalent to those obtained 
using BLAST wherein the p-value is less than 1 . 

The database may be a database of nucleotide sequences from a species 
related to the organism (e.g., S. cerevisiae and S. pombe) and a database of 
10 eukaryotic or prokaryotic nucleotide sequences. Specifically, the organism 
source of the eukaryotic nucleotide sequences may include, but is not limited 
to, primate, equine, bovine, caprine, ovine, porcine, feline, canine, lupine, 
camelid, cervidae, rodent, avian and ichthyes. The primate may be a human. 
Other organisms include vertebrates (e.g., mammals, birds, fish, and reptiles), 
15 invertebrates (e.g., worms), and plants. 

In another embodiment, the organism can be a fungus of the phylum 
oomycota, chytridiomycota, zygomycota, ascomycota, basidiomycota or 
deuteromycota. Preferably, the fungus is yeast of the phylum ascomycota. 
More preferably, the yeast is the genus Saccharomyces or 
20 Schizosaccharomyces. Most preferably the yeast is the species S. cerevisiae or 
S. pombe. 

The long genes are preferably about 100 or more amino acids in length. 
The smORFs preferably are less than about 100 amino acids, however, they 
can include polypeptides longer than 100 amino acids. 

25 The smORFs isolated as described herein can be utilized in, for 

example, a microarray. For instance, a nucleic acid microarray is fabricated 
by high-speed robotics, generally on glass but sometimes on nylon or silicon 
substrates, for which probes with known identity are used to determine 
complementary binding. These arrays permit massive parallel gene expression 

30 and gene discovery studies. This technology allows researchers to monitor the 
whole genome on a single chip so that they have a better picture of the 
interactions among the thousands of genes simultaneously. 

The present invention relates to smORF identified using the methods 
of the present invention, as well as a vector comprising the smORF and a cell 
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comprising the vector. The cell preferably expresses the polypeptide encoded 
by the smORF. Further, the present invention relates to a nucleic acid that 
hybridizes to the sense or the antisense strand of the smORF, as well as an 
isolated polypeptide encoded by the smORF. 

5 This invention also relates to 1 19 novel coding sequences (SEQ ID 

NOS: 1-119) from the S. cerevisiae genome discovered using the methods of 
the instant invention, or fragments thereof, and optionally, a sequence required 
for an amplification reaction. The fragment may be a primer. The invention 
further relates to an isolated polypeptide selected from the group consisting of 

10 SEQ ID NOS: 674-1346 and preferably SEQ ID NOS: 674-792, which appear 
to be expressed and in same instances, essential. The polypeptides should 
comprise at least 5 or 10 or more contiguous amino acid sequences of these 
sequences. 

The present invention also relates to methods of modulating the genes 
15 and gene products identified using an in silico method described herein and 
identifying such modulating agents. Preferred modulating agents include 
antibiotics, antifungals and antisense agents. Modulating agents are generally 
a compound or compositions that modulates the biological activity of a gene, 
its transcript or the protein(s) encoded by that gene. 
20 In another embodiment, the polypeptide or biologically active 

fragment thereof is in the form of a composition with a pharmaceutically 
acceptable carrier or excipient. 

The present invention further relates to antibodies and 
immunologically active fragments thereof that recognize and bind to a smORF 
25 polypeptide or fragment thereof. These antibodies can be human antibodies, 
humanized or primatized® antibodies, monoclonal antibodies or bispecific 
antibodies. A further embodiment of the invention includes immunologically 
active fragments of the antibodies, such as Fab, Fab', F(ab') 2 , Fv, scFv, and Fd. 

30 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 outlines the first steps of the strategy for new smORF 
identification using computational methods to identify new ORFs not 
identified by conventional methods. 

Figures 2A-2E show the experimental validation of the S. cerevisiae 
35 smORFs. Fig. 2A shows the control experiments demonstrating that the RNA 
used for the RT-PCR experiment was not contaminated with genomic DNA. 
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Fig. 2B shows the principle behind and the results of orientation-specific RT- 
PCR, thus demonstrating that the transcripts observed originate from the 
predicted DNA strand. Figs. 2D and 2E show more examples of transcripts 
detected from the smORFs. 

5 Figure 3 shows three yeast smORFs, which have highly conserved 

homologs in other fungi and illustrates that two have highly conserved 
homologs in mammalian species. Figure 3 shows the multiple sequence 
alignment of smORF18 (SEQ ID NO: 677) and its homologs, smORF139 
(SEQ ID NO: 709) and its homologs, andsmORF570 (SEQ ID NO: 769) and 

10 its homologs. Abbreviations: Dm, Drosophila melanogaster; Hs, Homo 
sapiens; Ce, Caenorhabditis elegans; Sc, Saccharomyces cerevisiae; Ca, 
Candida albicans; Af, Aspergillus fumigatus; An, Aspergillus nidulans; Sp, 
Schizosaccharomyces pombe; Bt, Bos taurus; and Mm, Mus musculus. 
Residues that are identical or similar in all protein homologs are shaded in 

15 black and those identical or similar in two or more, but not all proteins in the 
alignment are shaded in gray. Homology shading was done with GeneDoc 
(Nicholas, K. B., et al (1997), EMBnet News 4: 14). 

Figure 4 shows experimental evidence that smORF18 (SEQ ID NO: 4) 
codes for a polypeptide of the expected size. A triple HA-tag was fused to the 

20 C-terminal end of smORF 1 8 using PCR, and the wild-type smORF 1 8 gene 
was replaced by the tagged smORF 1 8 gene by allele replacement into the 
chromosome. Soluble extracts were prepared and analyzed by Western blot 
analysis using monoclonal antibodies that recognize the HA epitope. Extracts 
from wild-type cells (lane 2) and extracts from two separate isolates carrying 

25 the HA-tagged smORFl 8 (lane 3 and 4). 

Figure 5. Human smORF18 homolog complementation of the 
temperature sensitive (ts) phenotype of the smorfl8A strain. A yeast strain 
with a deleted smORF18 (smorfA) was transformed with plasmids carrying the 
wild-type yeast smORF18 (SEQ ID NO: 4), or the human smORF18 ORF 

30 under the control of the GAL1 promoter or empty vector. Transformants were 

then plated at 30°C and 37°C. 

Figure 6. Diagram of smORF57 protein interaction map. The arrows 

indicate the orientation of each two-hybrid interaction. 
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DETAILED DESCRIPTION OF THE INVENTION 

I. Definitions 

As used herein, the term "gene" refers to the fundamental physical and 
5 functional unit of heredity, which carries information from one generation to 
the next. A gene is a segment of DNA composed of a transcribed region and 
regulatory sequences that make possible transcription of the DNA. 

As used herein, the term "organism" refers to eukaryotes and 
prokaryotes. 

10 As used herein the term "known sequence" refers to a sequence (e.g., 

nucleic acid or amino acid) of any type publicly available and annotated. 

As used herein, the term "long gene" refers to a gene that encodes a 
polypeptide of about 100 amino acids or more. Long genes can include genes 
encoding a polypeptide that is 100, 1 10, 120, 130, 140, 150, 175, 200, 300, 

15 400, 500, 600, 750 and 1000 amino acids long or greater. 

As used herein, the term "homolog" refers to a gene and protein coded 
thereby from one species with similarities to another gene and its encoded 
protein of the same species or among different species. These similarities can 
be based on structural (e.g., sequence similarity and/or three-dimensional 

20 commonality) and/or functional similarities (e.g., enzymatic and/or 
biochemical activity). 

As used herein the term "ortholog" refers to a gene and protein 
encoded thereby from one species which corresponds to a gene and its 
associated protein in another species that is related via a common ancestral 

25 species (a homologous gene), but which has evolved to become different from 
the gene of the other species. 

As used herein, the term "ORF" refers to an open reading frame, which 
corresponds to a nucleotide sequence that could potentially be translated into a 
polypeptide. For the purposes of this application, an ORF may be any part of 

30 a coding sequence, with or without stop codons. An ORF is usually not 

considered to be an equivalent to a gene locus until an mRNA transcript for a 
gene product is generated. The gene product can be detected and/or the ORF's 
protein product has been identified. 

As used herein, the term "smORF" preferably refers to a small open 

35 reading frame that encodes a polypeptide of less than 100 amino acids. 

However, the methods of described herein can also be used to identify ORFs 

9 
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which encode polypeptides more than 100 amino acids long (e.g., 100, 125, 
150, 200, 300, 400 500, etc. amino acids long). smORFs may encode a 
polypeptide of at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 
25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 and 100 amino acids. 

5 Preferably, smORFs encode polypeptides of 17 or 18 to 100 amino acids long. 
The nucleic acids encoding these polypeptides accordingly include nucleic 
acids that are 15 to 300 nucleotides in length or any number of nucleotides 
between that range. The nucleic acid can be any that encodes the identified 
smORF protein, including synthetic nucleic acids and the wild-type nucleic 

10 acid. Preferred nucleic acids will have at least 8 contiguous nucleotides. 
However, other nucleic acids may have from 8 to 300 or more contiguous 
nucleotides, or any number lying within that range (e.g., 25, 75, and the like). 

As used herein, "annotation" refers to the description of the properties 
of a given sequence or gene, such as the protein encoded by the gene, function 

15 of the protein, its domain structure, post-translational modifications, variants, 
etc. 

As used herein, the term "in silico" refers to a computational method of 
analyzing nucleic acid and/or amino acid sequences. 

As used herein, the term "sequence identity" refers to the relatedness of 
20 two genetic sequences, as represented by the percentage of the amino acids 
and/or nucleotides they share. 

As used herein, the term "sequence homology" defines regions of DNA 
sequence, which are the same at different locations of the genome, or between 
different DNA molecules such as between the genome and a plasmid or DNA 
25 fragment. 

As used herein, the term "microarray" (also referred to as "biochip" 
and "DNA chip") refers to a microarray comprising nucleic acids. A 
microarray is fabricated by high-speed robotics, generally on glass but 
sometimes on nylon or silicon substrates, for which probes with known 
30 identity are used to determine complementary binding, thus allowing parallel 
gene expression and gene discovery studies. This technology allows 
researchers to monitor the whole genome on a single chip so that they have a 
better picture of the interactions among the thousands of genes 
simultaneously. 

35 As used herein, the term "fragment thereof refers to an incomplete 

and/or spliced section of the smORFs of the present invention. By 
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"biologically active" is meant that portion of the smORF that retains 
biological activity. For example, for a nucleic acid, it might be the activity of 
binding to a cognate strand. With reference to a polypeptide, by biologically 
active is meant that portion which is, for example immunogenic or has an 
5 antigenic epitope, or that has enzymatic activity. 

As used herein, the term "false positives" refers to a test result, which 
erroneously assigns the test subject to a specific group, due to insufficiently 
exact methods of testing. 

As used herein, the term "false negatives" refers to a test result, which 

10 excludes the test subject from a specific group, due to insufficiently exact 
methods of testing. 

As used herein, the term "hits" refers to when a database/computer 
reviews the information cache stored therein and finds data meeting the chosen 
parameters; the result is called a "hit." 

15 As used herein, the term "ESTs" ("expressed sequence tags") refers to 

a short strand of DNA, which is part of a cDNA. Because an EST is usually 
unique to a particular cDNA, and because cDNAs correspond to a particular 
gene in the genome, ESTs can be used to help identify unknown genes and to 
map their position in the genome. 

20 As used herein, the term "RT-PCR" refers to reverse 

transcriptase-polymerase chain reaction. In this process, mRNA is subjected 
to reverse transcriptase, resulting in the production of cDNA complementary 
to the mRNA. Large amounts of selected cDNA can then be produced by 
means of the polymerase chain reaction. 

25 As used herein, the term "database" refers to a large collection of 

genetic data organized especially for rapid search and retrieval by computer. 

As used herein, the term "algorithm" refers to a step-by-step procedure 
for solving a problem or accomplishing some end, especially by a computer. 
Specifically, the term "algorithm" refers to a search algorithm used to locate 

30 specific data from a genetic database. 

As used herein, the term "amplification reaction" refers to a reaction 
causing an increase in the number of copies of a specific DNA fragment, such 
as the polymerase chain reaction (PCR). 

The polypeptide of the present invention is preferably in an isolated 

35 form. As used herein, the term "isolated polypeptide" refers to a polypeptide 
removed from its native environment. Thus, a polypeptide produced and 
contained within a recombinant host cell would be considered "isolated" for 
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the purposes of the present invention. Also intended as an "isolated 
polypeptide" are polypeptides that have been purified, partially or 
substantially, from a recombinant host. Similarly, by "isolated nucleic acid" 
or "isolated polynucleotide" is meant a nucleic acid sequence, which is 

5 purified from other nucleic acid and protein contaminants. 

As used herein, the term " NrProtein database" refers to the non- 
redundant protein database, one of the databases available for searching using 
the BLAST algorithm. 

The present invention is directed to methods of identifying new genes 

10 in the genome of an organism. The method comprises the steps of removing 
all annotated ORFs and long genes from the organism's genome and then 
isolating small ORFs (smORFs) of preferably less than 100 amino acids. 
These smORFs have at least a 20% sequence identity to all known sequences 
from related organisms, determined by searching a database using a search 

15 algorithm. The methods may further comprise the steps of identifying the 
smORFs that are coding ORFs and verifying that the smORFs can transcribe 
RNA using molecular genetics tools. 

The present invention is also directed to 119 novel ORFs (SEQ ID 
NOS: 1-1 19) and their corresponding proteins (SEQ ID NOS: 674-792) from 

20 the & cerevisiae genome, which were identified through the methods of the 

present invention as set froth in Table 2. The present invention is also directed 
to 554 other ORF sequences (SEQ ID NO: 120-673) and their corresponding 
proteins (SEQ ID NOS: 793-1346) identified in S. cereviseae using the 
disclosed in silico method (see Table 2). 

25 

II. Identification of Novel Coding Sequences 

This invention relates to methods of identifying novel coding 
sequences in an organism, for example, S. cerevisiae, as well as in other 
prokaryotic and eukaryotic organisms. The methods of the present invention 

30 would be appropriate for use on the genome of any organism, including, but 
not limited to, plants {e.g., rice, maize, Aribidopsis), the plant pathogen 
Phytophthora, invertebrates (e.g., nematodes, higher worms, fruit flies, etc.), 
fish (e.g., zebrafish) mammals (e.g., mice, humans, etc.) and any of the other 
organisms discussed herein. 

35 One method of identifying new genes in the genome of an organism 

comprises the steps of removing annotated ORFs and long genes, preferably 
all known sequences, from the organism's genome, and then isolating small 
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ORFs (smORFs) comprising nucleic acid and amino acid sequences, 
preferably predicted amino acid sequences having at least a 20% sequence 
identity to all known sequences, more preferably amino acid sequences from 
related organisms, wherein percent identity is determined using an algorithm 

5 with parameter settings consisting essentially of or equivalent to a p-value of 
less than 1 used in conjunction with a BLAST algorithm to search a database 
of genetic information. 

Preferably, the methods of the present invention are especially 
adaptable for whole fungal genomes. More preferably, the fungus is yeast. 

10 Most preferably, the yeast is S. cerevisiae or C. albicans. Accordingly, one 
embodiment of the present invention is a method of identifying new genes in 
the genome of S. cerevisiae comprising the steps of removing all annotated 
ORFs and long genes from the S. cerevisiae genome, and then isolating small 
ORFs (smORFs) comprising predicted amino acid sequences having at least a 

15 20% sequence identity to all known fungal amino acid sequences, wherein 
percent identity is determined using an algorithm. For example, if the 
algorithm is BLAST the parameters comprise a p-value of less than 1. Other 
algorithms contemplated would use parameters producing similar results as 
would be known to the artisan of ordinary skill. 

20 A comparison of the yeast S. cerevisiae ORFs with a comprehensive 

fungal database (excluding S. cerevisiae) suggest that most budding yeast 
ORFs have homologs in other fungi. This led to the conceptualization and 
validation of a new process for identifying novel coding sequences. For 
example, this would include the following steps: 

25 1 . Take one nucleic acid genome of an organism to 

probe (e.g., S. cerevisiae). 

2. Collect known nucleic acid sequences (e.g., genes) 
of the genome from step 1 . 

3. Optionally remove known genes. 

30 4. Optionally take the portions of genome remaining 

after the above steps (known or otherwise, but not known to 
contain genes, e.g., intergenic regions). 

5. Take either intergenic region or whole genome. 

6. Identify all open reading frames (ORFs) of 
35 preferably about 17 amino acids or longer stop-to-stop. 
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7. Perform a six-frame translation (three frames 
forward, and three frames backward to correspond to the 
complementary strand). 

8. Look for stop codons (*). Start counting residues 
5 right after the stop codon to the next stop codon. Take all the 

sequences that are preferably 17 amino acids or longer and call 
it an ORF (stop-to-stop). Typically, most programs identify 
sequences of at least 50 to 60 amino acids or longer. 

9. The novel step is then to construct a comprehensive 
10 database containing genomic DNA and cDNA sequences from 

as many organisms related to the subject as possible. For 
example, if the subject organism is S. cerevisiae, the database 
would include genomic and EST sequences from as many 
fungal species (excluding 5. cerevisiae) as available in the 
15 public and/or private databases, including C. albicans, 

Aspergillus nidulans, A. fumigatus, Schizosaccharomyces 
pombe, Neurospora crassa, Cryptococcus neoformans, 
Fusarium sporotrichioides, etc. 

10. The ORFs identified in steps 7 and 8 are then 
20 compared against a six-frame translation of the nucleotide 

sequences contained in the database described in step 9. For 
example, if the organism being studied is S. cerevisiae, then the 
ORFs identified in step 6 are compared against the nucleotide 
sequences in the fungal database. Preferably, a comparison 
25 algorithm, such as TBLASTX is used. In the instance of 

TBLASTX, the parameters preferably include a p-value of less 
than 1 . Comparable algorithms with comparable parameters 
can also be utilized. 

1 1 . Compare the amino acid sequences using sequence 
30 identity parameters. 

12. Collect all the hits against entries in the database 
(e.g., fungi). 

13. A hit determines whether the ORF being studied 
from the first organism (e.g., S. cerevisiae) is likely to be a 

35 coding ORF (i.e., smORF), because it has predicted homologs 

in the organisms contained in the database (e.g., fungal 
database). 
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A. Compilation of Organism Genome and Removal of Annotated ORFs 
For an ORF to be considered to be a good candidate for coding a 

cellular protein, a minimum size requirement is often set. This is not the case 
5 here. One novel characteristic of the present invention is that the small ORFs, 
which are often discounted in genome analysis, are considered here. 

The first step in the methods of the present invention is an examination 
of the entire genome of the organism of choice, as outlined in Fig. 1. The 
sequences of the genome of choice may be found anywhere, including, but not 
10 limited to, GenBank™, EST sequence databases, Celera's recent human 
genome database (Venter et al., "The Sequence of the Human Genome," 
Science 291: 1304-51 (2001)), and other organism genome databases as they 
are elucidated. For example, the entire S. cerevisiae genomic sequence (12.07 
mb total) was examined, and obtained from the Saccharomyces Genome 
15 Database as of December 5, 1997. (See http://genome- 
www.stanford.edu/Saccharomyces/). 

B. The Isolation of smORFs Using Bioinformatics 

The next step in the method of the claimed invention is the isolation of 

20 smORFs, by running the remaining ORFs obtained in the above steps against a 
database of known genes to identify any potential homologs. The database 
can be any searchable database, which can identify homologous sequences. 
Preferably the databases are compared using algorithms such as BLAST or 
FASTA or equivalent algorithms. 

25 Specifically, a method of identifying new genes in the genome of an 

organism comprises the steps of removing all annotated ORFs and long genes 
from the organism's genome. Alternatively, the removal of sequences does 
not need to occur. This is followed by isolating small ORFs (smORFs) 
comprising nucleic acid and amino acids sequences having at least a 20% 

30 sequence identity to all known sequences from related organisms. Preferably, 
the comparison is of amino acid sequences. 

The smORFs may have a sequence identity to all known sequences 
from related organisms of about 20% or more. Preferably, the sequence 
identity is at least about 25% sequence identity and more preferably at least 

35 about 30% sequence identity. 

The first organism database searched and compared to another 
organism may comprise a plurality of known genomic nucleotide sequences 
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and expressed sequence tags (ESTs). For example, the nucleic acid encoding 
the polypeptide sequences of the present invention are analyzed using BLAST, 
against any type of sequence from similar organism, including, but not limited 
to, nucleotide sequences, protein sequences, peptide sequences and ESTs. 

5 In this step, the database should be a database of nucleotide sequences 

from a species related to the organism of choice. For example, the genome of 
the yeast S. cerevisiae was searched against a database of all known fungal 
sequences. Alternatively, the database may be a database of all eukaryotic 
nucleotide sequences. Specifically, the organism source of the eukaryotic 

10 nucleotide sequences may include, but is not limited to, primate, equine, 
bovine, caprine, ovine, porcine, feline, canine, lupine, camelid, cervidae, 
rodent, avian and ichthyes. If a primate database is searched, the primate is 
preferably human. 

The long genes removed from the genome are all genes of about 100 or 

15 more amino acids. The small ORFs (smORFs), the preferred sequences of 
interest in the present invention, are sequences of typically less than 100 
amino acids. However, the methods of the invention can be used to identify 
ORFs, which encode polypeptides greater than 100 amino acids. One of the 
novel features of the instant invention is the focus on ORFs, which are small 

20 and therefore previously excluded or not rigorously studied by researchers. 

For example, in the present invention, the S. cerevisiae genome was 
analyzed and the nucleotide sequences of the previously identified 6,224 
coding ORFs were removed. Next, the remaining sequences (3.45 mb) were 
analyzed to identify all stop-to-stop ORFs using a size of preferably about 17 

25 or 18 residues or longer based on the fact that in E. coli, the overwhelming 
majority of genes code for proteins of preferably about 1 7 or 1 8 amino acids 
or longer (E, coli Genome Center, October 13, 1998, revision date, University 
of Wisconsin, Madison), http://www.genetics.wisc.edu/). This analysis 
produced approximately 140,000 ORFs, most of them shorter than 100 

30 residues. 

In isolating smORFs of an organism's genome, a microarray may be 

used. 

In one embodiment of the present invention, the ORFs thus identified were 
searched against a comprehensive fungal sequence database to identify any 
35 ORFs with potential homologs. This fungal database consisted of all NCBI 
entries listed under "fungi" (August 20, 2000, excluding any 5". cerevisiae 
sequences), plus the genomic sequences from Candida albicans (Stanford 
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University) and Aspergillus fumigatus (PathoGenome™ database) {A. 

* 

fumigatus genomic sequences are available at http://www.LabOnWeb.com), 
EST sequences from Aspergillus nidulans, Cryptococcus neoformans, 
Fusarium sporotrichioides, and Neurospora crassa (University of Oklahoma 
5 Health Sciences Center), and Pneumocystis carinii EST sequences (University 
of Georgia). Using a cutoff score of p~*10" 4 (a score of p~*10 4 was chosen, 
since it is reasonably stringent for small ORFs), 1057 S. cerevisiae ORFs were 
identified with potential homologs in the fungal database. Preferably the p 
value when using BLAST is a value less than 1 . After removing smORFs 

10 overlapping with rRNA, tRNA and retrotransposon elements (i.e., TY 
elements), 673 smORFs were obtained (SEQ ID NOS: 1-673). Since 
homologs of these budding yeast ORFs were found in at least one other fungal 
species, it seems reasonable to predict that most of these 673 ORFs (SEQ ID 
NOS: 1-673) are likely to be coding ORFs (Fig. 1) as further described in 

15 Table 2. 

Table 2 describes the function of the genes and proteins of the present 
invention. The first column contains the smORF designation number. The 
nucleotide and amino acid sequences designated by their SEQ ID NOS are 
contained in the second and third columns. The corresponding length of the 

20 nucleotide and amino acid sequences are listed in the fourth and fifth columns, 
respectively. BLAST scores and probabilities from the described analysis 
herein are provided in the sixth and seventh columns, respectively. The 
description of the gene and protein is contained in the eighth column. The 
description field provides, where available, the accession number (AC) or 

25 SwissProt accession number (SP), the locus name (LN), Superfamily 

classification (CL), the organism (OR), the source of variant (SR), the E.C. 
number (EC), the gene name (GN), the product name (PN), the function 
description (FN), the map position (MP), left end (LE), right end (RE), coding 
direction (DI), the database from which the sequence originates (DB), and the 

30 description (DE) or notes (NT) for each ORF. 

C. Validation of the Novel Coding Sequences 

Finally, the smORFs identified using the methods of the present 
invention may be validated as coding sequences able to transcribe RNA by the 
35 use of known experimental techniques such as reverse transcriptase- 
polymerase chain reaction (RT-PCR). A subset {i.e., 154) of the 673 smORFs 
(SEQ ID NOS: 1-673) were chosen for analysis by RT-PCR. RT-PCR 
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analysis showed that a transcript could be demonstrated with 119 smORFs 
(SEQ ID NOS: 1-1 19). With regard to any smORFs identified and validated 
through the methods described above, the present invention further relates to a 
vector comprising such a smORF, a cell comprising the vector, a polypeptide 
encoded by the smORF and a nucleic acid which hybridizes to the sense or 
antisense strand of a smORF identified using the methods of the present 
invention, preferably under stringent conditions. 

Stringency is a term used in hybridization experiments to denote the 
degree of homology between the probe and the filter bound nucleic acid; the 
higher the stringency, the higher percent homology between the probe and 
filter bound nucleic acid. If the stringency is too low, unspecific hybridization 
may occur. If the stringency is too high, only a weak or no signal may be 
observed. For any hybridization, stringency can be varied by manipulation of 
three factors: temperature, salt concentration, and formamide concentration; 
however, stringent conditions are sequence-dependent and will differ 
depending on the circumstances. For example, longer sequences hybridize 
specifically at higher temperatures. Generally, highly stringent conditions are 
selected to be about 5-10°C lower than the thermal melting point (TJ for the 
specific sequence at a defined ionic strength pH. Low stringency conditions 
are generally selected to be about 15-30°C below the T m . The T m is the 
temperature at which 50% of the probes complementary to the target hybridize 
to the target sequence at equilibrium. Stringent conditions will be those in 
which the salt concentration is less than about 1 .0 M sodium ion, typically 
about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3, 
and the temperature is at least about 30°C for short probes (e.g., about 10 to 
about 50 nucleotides) and at least about 60°C for long probes (e.g., greater 
than about 50 nucleotides). Stringent conditions may also be achieved with 
the addition of destabilizing agents such as formamide. 

The degree of hybridization may also depend the amount of identity 
between the sequences. Preferably the region of identity is greater than about 
5 bp, more preferably the region of identity is greater than 10 bp. 

Stringent hybridization conditions are known in the art and include, but 
are not limited to: (a) washing with 0.1X SSPE (0.62 M NaCl, 0.06 M 
NaH 2 PCVH 2 0, 0.075 M EDTA, pH 7.4) and 0.1% sodium dodecyl sulfate 
(SDS) at 50°C; (b) washing with 50% formamide, 5X SSC (0.75 M NaCl, 
0.075 M sodium citrate), 50 mM sodium phosphate (pH 6-8), 0.1% sodium 
pyrophosphate, 5X Denhardt's solution, sonicated salmon sperm DNA (50 
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jig/ml), 0.1% SDS and 10% dextran sulfate at 42°C, followed by washing at 
42°C in 0.2X SSC and 0.1% SDS; and (c) washing with 0.5 M NaP0 4 , 7% 
SDS at 65°C followed by washing at 60°C in 0.5X SSC and 0.1% SDS. High 
stringency hybridization conditions are those performed at about 20°C below 

5 the melting temperature (T^. Preferred stringency is performed at about 5- 
10°C below the melting temperature (TjJ. Additional hybridization 
conditions can be prepared as found in chapter 1 1 of Sambrook et al. 9 (1989) 
Molecular Cloning: A Laboratory Manual . 2d Ed. Cold Spring Harbor 
Laboratory Press, or as would be known to the artisan of ordinary skill. 

10 Extensive guides to the hybridization of nucleic acids and sequence 

identity can be found in Sambrook et ah, (1992) Molecular Cloning: A 
Laboratory Manual . 2d Ed. Cold Spring Harbor Laboratory Press and Ausubel 
et al. y (1995) Current Protocols in Molecular Biology , Greene Publishing Co., 
NY. 

15 We have developed and validated a novel method for gene 

identification in sequenced genomes and used it to identify new genes in 
5. cerevisiae. With this method, one should be able to find new coding ORFs 
in S. cerevisiae or other yeasts by simply searching potential budding yeast 
ORFs against other fungal species. Even though our experimental design was 

20 purposely non-exhaustive to demonstrate the proof of principle and the 

validity of this gene discovery process, we found strong evidence for several 
hundred new genes in the S. cerevisiae genome. For the three new genes 
selected for detailed analysis and experimental studies, we identified orthologs 
in other fungal species, as well as in other eukaryotes (e.g., mammals). This 

25 example can be expanded to include smORFs that partially overlap with 

annotated ORFs and smORFs that are completely located within previously 
annotated ORFs. The identification of conserved genes across a wide range of 
species provides the opportunity to use S. cerevisiae and/or other fungi to 
study the function of their counterparts in humans. In addition, the disclosed 

30 methods can be applied to other sequenced genomes, including humans, in 
order to identify coding ORFs not previously detected using conventional 
methods. This novel genome comparison approach to identify new ORFs will 
accelerate genome annotation and gene identification. 



35 III. Novel smORF Sequences Identified 

To establish a proof of principle and verify this new method, a case 
study was done using the budding yeast genome, because it is one of the most 
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exhaustively studied biological systems. Consequently, analysis of this 
genome to identify new genes not previously described is a rigorous test of the 
system, challenging the present methods used to identify new genes. 

The new smORFs identified using the methods described herein were 
5 then subjected to a validation step. A comprehensive analysis of the three 
smORFs was performed as a means of verifying their ability to encode a 
polypeptide. Most of the analysis was done with the Compas™ package 
(Genome Therapeutics Corporation), which performs a database search, as 
well as identification of such structural elements as motif, protein family 

10 (pfam), helix-turn-helix, coiled-coil and signal peptide to name a few; 

Compas™ also identifies protein secondary structure and predicts cellular 
location. We identified a wide range of homologs in other species for all three 
smORFs. SmORF18 and smORF570 have homologs in fungi and mammals 
(Fig. 3). SmORF18 also has plant homologs. Homologs of smORF139 were 

15 found only in fungi so far (Fig. 3). SmORF18 seems to be part of a larger 

protein in Arabidopsis thaliana, Sorghum bicolor^ Oryza sativa, Glycine max 
and other plants, but the orthologs in human, Caenorhabditis elegans, 
Drosophila melanogaster^ and Schizosaccharomyces pombe are about the 
same length as the 5. cerevisiae smORF. 

20 While the patches of highly conserved residues in the homologs for the 

three smORFs strongly suggest that these ORFs encode proteins, the definitive 
proof came from experimental work, wherein molecular genetics tools were 
used to confirm that these smORFs transcribe RNA. Primers were designed to 
amplify the three smORFs as well as the ACT1 gene (actin) control. The 

25 primers were chosen to give a PCR amplification product of 250 to 300 base 
pairs that lies inside the ORFs. Examples of primers for the ACT1 gene and 
three smORFs are shown in Table 1. These primers were used for PCR 
amplification of S. cerevisiae Genomic DNA (template) to test the PCR 
amplification conditions (Yeast genomic DNA was prepared from strain W303 

30 using the Yeastar Genomic DNA kit (Zymo Research) as suggested by the 
manufacturer. 
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Table 1 



smUKr 


i 1 IU1CI k? UCIltC 


SEQ ID NO 


SmUKr 1 o 


S'-TOArOA A ATCGAAATCGAAG-3' 






v PrATGrrTGrrTrTTrGTAGT-T 




smORF139 


5-TGCCTAAGAGATTAAGTGGGTT-3' 






5'-CGTCAGTTCAGGGTGTGAAA-3* 




smORF570 


5 '-TGTCTGC ATT ATTT AATTTTCGTTC-3 ' 






5'-AGCTGTTAAATTGACTGATGGC-3' 




yeast A CT1 gene 


5'-TGTCACCAACTGGGACGATA-3' 






5'-AACCAGCGTAAATTGGAACG-3' 





Products of the predicted size were obtained for all three smORFs, as 
well as the actin control (Fig. 2A, lanes 2, 6, 10, and 14). No PCR products 

5 were obtained in reactions without template (Fig. 2 A, lanes 1,5,9, and 13), or 
using RNA isolated from S. cerevisiae grown on rich media (YEPD) or 
complete synthetic minimal (CSM) media (Fig. 2A, lanes 3, 4, 7, 8, 1 1, 12, 15, 
and 16). This indicates that these RNA samples were not contaminated with 
genomic DNA (RNA was isolated from 5 X 10 7 yeast (strain W303) cells 

10 growing exponentially in YEPD or synthetic complete minimal media using 
the RNeasy™ Mini kit from Qiagen including a DNase (Roche) digestion 
step.) We then tested for the presence of RNA transcripts originated from 
these smORFs, as well as from the actin control using RT-PCR (RT-PCR 
reactions were done with the OneStep RT-PCR Kit from Qiagen as 

15 recommended by the manufacturer). Products of the expected sizes were 

obtained for actin, as well as all three smORFs (Fig. 2B, lanes 2, 3, 5, 6, 8, 9, 
1 1, and 12). This indicates that actin and the three smORFs are indeed 
expressed in yeast cells grown in both rich and in minimal media. No RT- 
PCR product was obtained in reactions without template (negative control) 

20 (Fig. 2B, lanes 1, 4, 7, and 10). The identity of the RT-PCR products was 
confirmed by cloning. The RT-PCR products were isolated from an agarose 
gel and then cloned into pCR21-TOPO (Invitrogen), as recommended by the 
manufacturer. The sequences were then restriction mapped and dideoxy 
sequenced. 

25 To determine whether the identified smORFs were indeed transcribed 

from the predicted DNA strands, a modified RT-PCR experiment was 
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performed. First, primer complementary to the predicted mRNA and the 
reverse transcriptase were added. After first strand cDNA synthesis, the 
reverse transcriptase was inactivated with heat. Taq polymerase and both 
smORF-specific primers were then added (Fig. 2C). Under these conditions, 

5 PCR products were observed only when first strand synthesis was conducted 
with primers complementary to the predicted mRNA (lanes 5, 6, 11, 12, 17 
and 18). No PCR product was observed if first strand synthesis was done with 
primers that have the same sequence as the mRNA (lanes 3, 4, 9, 10, 15 and 
16). These results indicate that the transcripts observed for smORFs 18, 139 

10 and 570 (SEQ ID NOS: 4, 36 and 96) are made from the predicted strand. 

This same study was extended to 151 additional smORFs, most of which have 
a potential homolog in the genome of C. albicans. The results show that a 
RT-PCR product of the expected size was obtained for 1 16 of these smORFs 
(Figs. 2D and 2E). Therefore, 1 19 of the 154 smORFs are transcribed from 

15 the predicted DNA strand (Table 2). See SEQ ID NOS: 1-119. 

To address the possibility that the observed smORF transcripts were 
products of read-through transcription from genes located upstream from the 
smORFs, the RT-PCR experiment was conducted using a primer 
complementary to the mRNA for first strand synthesis (Fig. 2C) and with a 

20 second primer located 400 base pairs upstream of the smORF. Under these 
conditions, no RT-PCR product was observed demonstrating that the smORF 
transcripts were not the result of read-through transcription from upstream 
genes. 

Functional analysis can then be performed. For example, site-directed 
25 mutagenesis can be performed to disrupt the function of each gene and 

examine the resulting phenotypic changes, as would be known to the artisan of 
ordinary skill. The three smORFs described here do not overlap with 
previously annotated ORFs and a clear start-to-stop ORF can clearly be 
defined. These three ORFs are not duplicated on the budding yeast genome, 
30 as only one copy of each ORF was identified in the genome. Additionally, 
these S. cerevisiae smORFs have highly conserved homologs in other fungal 
species (50 to 60% amino acid identity and 70 to 80% similarity). In the case 
of smORFs 18 and 570 (SEQ ID NOS: 677 and 769, respectively) highly 
conserved homologs could also be found in mammalian genes. 
35 The yeast smORFs identified using the methods described herein are 

described more fully below. 
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(i) Yeast smORF570. Comprehensive bioinformatics analysis of the 
yeast smORF570 protein sequence (SEQ ID NO: 769) suggests that this 
protein functions as a secreted protein. Using SigCleave (eGCG version 8), 
we have identified three overlapping signals with scores of 1 1.6, 6.4 and 5-1, 
in a region that extend from amino acid 9 through amino acid 29, with a 
predicted cleavage site in the region of amino acids 22-27. Although 
TopPredll suggests the presence of two transmembrane domains with 
moderate certainty, the initial domain identified overlaps the SignalPeptide 
prediction noted earlier and likely represents the hydrophobicity associated 
with the SignalPeptide region. Given the presence of three conserved cysteine 
residues within the protein, which are likely to represent sites of inter- or intra- 
protein cross-linking, the second site identified by TopPredll is sub threshold 
(below a certainty cut-off of 1 .5) and is more consistent with hydrophobicity 
that drives protein folding rather than a membrane spanning region. Taking 
these data together, our analysis would support the function of smORF570 as a 
secreted protein that could act as either a ligand, a soluble receptor or a 
binding protein. Based on this information, smORF570 would also be a target 
for antifungal agents and other therapeutics described herein. 

The human homolog of smORF570 maps to Chromosome 19 
(19ql3.1), in a region with multiple olfactory receptors (AC005255, between 
OLFR and MEL), though the gene itself was not identified. The human 
smORF570 protein is 74% identical to its D. melanogaster homolog 
(AE003512), 39% identical to its C. elegans counterpart, and 40% identical to 
a novel gene expressed in human adrenal gland (AF 164793). EST hits for the 
human smORF570 homolog were found with bovine placenta, pig spleen 
lambda, mouse irradiated colon, and embryonal carcinoma cell line F9. Based 
of this information, the human homolog is most likely involved in cancer and 
could act as a target as a therapeutic target. 

(ii) Yeast smORF18. Of particular note is the sequence conservation 
(31%) share in common with the N-terminus of a chicken fas ligand receptor - 
soluble form (AF296875, 285 amino acids, p = 0.84). The number and 
spacing of Cys residues are also similar in the aligned portion of the two 
proteins. EST hits were found in mouse placenta, Beddington mouse 
dissected endoderm, rat kidney, rat embryo, and human placenta. 

The conservation of residues across fungi suggests that smORF18 
could be used as an antifungal target using the methods described herein. The 
identity between human smORFl 8 homolog and its counterparts in D. 
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melanogaster, C. elegans, A. thaliana are 70%, 69% and 60%, respectively, at 
amino acid residue level. SmORF18 protein is also 31% identical to 
Schizosaccharomyces pombe dnaj heat-shock protein (316 amino acids). 

To further demonstrate the validity of the method, a comprehensive 
analysis of smORF18 was conducted. A wide range of homologs was 
identified in other species (Fig. 3). SmORF18 seems to be part of a larger 
protein in Arabidopsis thaliana, Sorghum bicolor, Oryza sativa, Glycine 
max and other plants. The human, Caenorhabditis elegans, Drosophila 
melanogaster and Schizosaccharomyces pombe smORF18 homologs are about 
the same size as the S. cerevisiae smORF18 (SEQ ID NO: 677). SmORF18 
(SEQ ID NO: 4) was recently annotated by Blandin et al. 9 (FEES Lett. 487: 
31, 2000) and assigned the systematic name YBL071W-A. 

Study of smORF18 (SEQ ID NO: 4) was extended to determine 
whether a protein product of the appropriate size could be detected. A triple 
HA-tag was fused to the C-terminus of smORF18 (SEQ ID NO: 4) by PCR. 
First a PCR amplification was made using a primer corresponding to 400 bp 
upstream of smORF18 (L) and a second primer containing the C-terminus of 
smORF18 fused the HA-tag (5 1 - 

GGAGCCTGATCCAGCGTAGTCTGGGACGTCGTATGGGTAGCCAGCG 
TAGT 

CTGGGACGTCGTATGGGTAGCCAGCGTAATCCGGAACATCATACGG 

GTATCCTACGGCAGCAGCGGCAATAGGCTCAGG-3') (SEQ ID NO: 

). A second amplification was carried out with a forward primer containing 
the tag 5 f - 

GTAGGATACCCGTATGATGTTCCGGATTACGCTGGCTACCCATA 
CGACGTCCCAGACTACGCTGGCTACCCATACGACGTCCCAGACTAC 
GCTGGATCAGGCTCCTAAAGATGAGAGGCTAGATCGAG-3 1 (SEQ ID 

NO: ) and a primer located downstream of smORF18 (5'- 

TGTCGCTTTTTCTCCTCGATG 

AAGCCAAGCGCCGAACCAATTGATATCATCGGCACG-3') (SEQ ID 
NO: __). The wild-type smORF18 gene was replaced with the tagged version 
by allele replacement into the chromosome (Erdeniz et aL 9 1997, Genome 
Res. 7: 1 174). PCR amplification of the smORF18 (HA) 3 gene from genomic 
DNA followed by cloning and sequencing confirmed the identity of the tagged 
smORF18. For sequencing, PCR products were isolated from an agarose gel 
and then cloned in to pCR2.1-TOPO (Invitrogen). Soluble SI 00 extracts were 
prepared from diploid W303 (BJ. Thomas et aL, 1989, Genetics 123:725) and 
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from HA-tagged yeast cells grown in 25 ml of rich medium (YPD) to mid-log 
phase as described (Brown et al, 1996, Mol Cell Biol 16: 5744). Soluble 
extracts were then fractionated in 18% polyacrylamide gels containing SDS. 
The proteins were then transferred to a PVDF membrane and the blot probed 
with anti-HA antibodies. The results show a protein band corresponding to a 9 
kDa protein (Fig. 4, lanes 3 and 4) in extracts prepared from cells with a 
tagged smORF18 gene and not in wild-type cells. This result demonstrates 
that smORF18 (SEQ ID NO: 4) is not only transcribed, but also encodes a 
detectable protein product of the predicted size. 

A next step of the process of identification and characterization of the 
gene is to further test if the smORF is essential. For example, one copy of the 
complete smORF18 gene was deleted in a diploid yeast strain by homologous 
recombination. Cells were transformed with a PCR fragment containing the 
HIS3 marker flanked by 400 bp of smORF18 sequences. The HIS3 sequence 
replaced amino acids 1 to 82 of smORF18. Histidine prototrophs were 
selected and PCR was used to verify correct genomic integration. Sporulation 
and tetrad analysis showed that haploid strains with a smorfl8A were able to 
grow at 30°C (slow growth), but not at 37°C (Fig. 5). We next tested if the 
human smORF18 is a functional homolog of the yeast smORF18. The human 
smORF18 gene, which was obtained from an EST clone, and the yeast 
smORF18 were cloned into pYES (Invitrogen) vector for expression in yeast 
under the GAL1 promoter. The human smORF18 coding sequence was 
amplified from I.M.A.G.E. clone 1047404 (Research Genetics, Inc.). The 
yeast smORF18 was amplified from genomic DNA. PCR fragments were 
cloned into pYES2.1/V5-His-TOPO (Invitrogen). Clones were verified by 
sequencing and transformed into the <?m0r/A18strain. The resultant 
transformants were tested for the ability to complement the temperature 
sensitive phenotype of the smorf2A strain. The results demonstrate that the 
cloned human smORF18 as well as the yeast smORF18 (SEQ ID NO: 4) can 
complement the temperature sensitive phenotype of the smorf2A strain (Fig. 
5). These results indicate that the human smORF18 is a functional ortholog of 
yeast smORF18 (SEQ ID NO: 4). The human smORF18 maps to two loci in 
the human genome, one in chromosome 3 where the gene contains two introns 
and codes for a predicted mRNA identical to the EST, and to a locus in 
chromosome 20 (i.e., 20gl3.2-13.33, AL035669) without introns but with nine 
predicted amino acid substitutions. These data indicate that small ORFs are 
present and expressed in humans and underscores the importance of looking 
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for small genes in the genomes of higher eukaryotes. smORF18 is essential 
for growth of yeast at 37 °C and has conserved homologs in organisms from 
yeast to man. smORF18 was used as bait in the two-hybrid analysis to isolate 
interactors. This gene is essential in yeast. 

(iii) Yeast smORF139 (SEQ ID NO: 36). The smORF 139 protein 
(SEQ ID NO: 709) appears to be a conserved protein in fungi. However, the 
conserved sequence, "LSGLQK", is shared with lamin B2 from Xenopus 
laevis, chicken and human. The S. cerevisiae smORF139 protein is also 35% 
identical to an unidentified protein (AC003000) from Arabidopsis thaliana 
chromosome II (see below), and 33% identical to the middle section of 
glutathione transferase (S33628) from Dianthus caryophyllus (Clove pink). 
SigCleave (eGCG version 8) identified a weak signal peptide (score 0.9) from 
residue 13 to 26. No transmembrane domain was found. The A. fumigatus 
version has an intron in the gene. SmORF 139 (SEQ ID NO: 709) was found 
in the region of ade2 gene for phosphoribosylaminoimidazole carboxylase, 
and pheromone response protein (RGA1) in Zygosaccharomyces rowcii. 
smORF139 (SEQ ID NO: 628) from S. cerevisiae is 74% identical to an 
unknown protein in Zygosaccharomyces rowcii. S. cerevisiae smORF139 also 
has a hit (38% identify) to a Medicago truncatula (plant) EST sequence 
(AW584424). 

The smORF139 protein (SEQ ID NO: 709) is 35% identical to 
"Arabidopsis thaliana protein fragment SEQ ID NO: 1495" disclosed by 
Ceres Inc., on 25-FEB-1999. The smORF139 is, however, conserved among 
fungi and therefore, could be used as a target for antifungal compositions 

described herein. 

iv. Yeast smORFSX smORF57 (SEQ ID NO: 13) is conserved 
between S. cerevisiae and C. albicans. The closest homolog in C. albicans is 
orf6.5842 and the following is the alignment between the two sequences: 

Score - 94 (38.1 bits), Expect = 2.2e-10, P = 2.2e-10 
Identities = 23/89 (25%) , Positives = 50/89 (56%) 

Sc: 4 NLS PLQQE VLDKYKQLS LDLKALDET I KELNYSQHRQQHS QQETVS PDE I LQEMRD I E VK 
NLSP++Q++L +Y+ ++ +L + ++ L + + ++ +++ +R +E K 

Ca : 24 NLSPIEQKILQQYQLMNNNLIKVSNELELLTNTTDEFGKGKGSSI HLVENLRQLETK 

Sc: 64 IGLVGTLLKGSVYSLILQRKQ- -EQESLG 90 
+ V T KG+VYS++ + EQE+ G 
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Ca: 81 LVFVYTFFKGAVYS ILNAQDYI AEQETNG 109 



When smORF57 was used as bait three proteins were found as interactors, 
Dadlp, Damlp, and Duolp which are part of a complex of proteins that 

5 function in kinetochore function and are important for mitotic spindle. 

integrity. (Enquist-Newman M. et al. 9 2001 Mol Biol Cell 12: 2601-2613). 
The interactions between smorf57 and Dadlp, Damlp, and Duolp have been 
confirmed by directed testing in the yeast two-hybrid system. Damlp and 
Duolp have homologs in C albicans, which are orf6.7374 and orf6.6397 

10 respectively. (Cheeseman I.M. et al J. Cell Biol 152: 197-212). In addition, 
Dadlp has a homolog in C. albicans in Contig6-2505 (Enquist-Newman M., 
et al., 2001 Mol Biol Cell 12: 2601-2613). The C. albicans genes coding 
for Dadlp, Damlp, and Duolp were also used in the yeast two-hybrid system 
to analyze the interactions. A diagram indicating the confirmed interactions 

15 between smORF57 and Dadl, Daml, and Duol is shown in Figure 6. 

smORF57 also interacted with Mlplp, a non-essential (Myosin like protein 1) 
localized to the nucleus close to the nuclear envelope and the gene product 
from the YLR287C gene, which is a non-essential protein of unknown 
function. 

20 The interaction of smORF57 with the Dadl/Daml/Duol complex suggests 
that it also is involved in kinetochore function and mitotic spindle integrity. 
Moreover, the conservation of residues coupled with the lack of a human 
ortholog strongly suggests that smORF57 would be a target for antifungal 
treatment and compositions described herein. In addition, smORF57 would 

25 also be involved in diagnosing fungal infections which is also provided by this 
invention. 

smORFsl72 and 181 (SEQ ID NO: 43 and 44, respectively). 

These two smORFs also have homologs in C. albicans and the 
30 alignments are shown below: 

smORFl72 (SEQID NO;43): 

Score = 339 (124.4 bits) , Expect = 2.4e-30, P - 2.4e-30 
Identities = 63/77 (81%), Positives = 69/77 (89%), Frame = -3 
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Query : 1 MDALNSKEQQEFQKVVEQKQMKDFMRLYSNLVERCFTDCVTTOFTTSKLTNKEQTCIMKCS 

MD LN KEQQEFQ++VEQKQMKDFM LYSNLV RCF DCVNDFT++ LT+KE +CI KCS 
Sbj Ct : 31134 MDQLNVKEQQEFQQIVEQKQMKDFMNLYSNLVSRCFDDCVNDFTSNSLTSKETSCIAKCS 
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30955 

Query: 61 EKFLKHSERVGQRFQEQ 77 

E KFLKHS ERVGQRFQEQ 
5 Sbjct: 30954 EKFLKHSERVGQRFQEQ 30904 

smORF181 (SEQ ID NO:44): 

Score = 192 (72.6 bits), Expect = 8.8e-15, P = 8.8e-15 
Identities = 38/85 (44%) , Positives = 56/85 (65%) , Frame = +1 

10 

Query: 10 RQVLSLYKEFIKNANQFNNYNFREYFLSKTRTTF 6 9 

+Q+L LYK+ ++ A +F+NYNF+EY K TF+ N + + + + E N 

L +L 

Sbjct : 4 054 KQILLLYKQLLEKAYKFDNYNFKEYSKRKIVETFKANKSLTNENEINQFYNEGINQLALL 
13233 



Query: 70 KRQSVISQMYTFDRLWEPLQGRKH 94 

RQ+ ISQ+YTFD+LWEPL +KH 
20 Sbjct: 4234 YRQTTISQLYTFDKLWEPL- -KKH 4302 

The smORF172 (SEQ ID NO: 43) was recently annotated (TIM9) and its gene 
product is believed to be a translocase in the inner membrane of mitochondria involved 
mitochondrial protein import. (Leuenberger D, et al. 1999. Different import pathways 
through the mitochondrial intermembrane space for inner membrane proteins. EMBO J. 
25 18:4816-22). 

The smORF181 is also conserved among fungal species thus implicating it as a 
target for antifungal treatment. 

v. Additional smORF Validation. 

30 To validate additional smORFs, the essentiality test was extended to 

125 smORFs (Table 4) with the following results: 

TABLE 4 



SEQ ID 


SEQ ID 
NO 


SmORF 
No. 


Essentiality Result 


SC0013 


13 


smorf057 


Confirmed essential 


SC0034 


34 


smorfl27 


Possibly essential 


SC0043 


43 


smorfl72 


Confirmed essential 


SC0044 


44 


smorfl81 


Confirmed essential 


SC0047 


47 


smor£207 


Possibly essential 
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SEQ ID 


OJ^Vj ID 


iNO- 


t^sseniiaiiiy ivesuu 


SC00D2 




smonzoo 


X^UoolUl_y GaoGlllloJ. 


SC0060 


oO 


smoruuj 


T^/XO CI V\1 PCCPTltl Q 1 

r Uobluiy CooCllllcll 


SCOOoo 


DO 


smon^ j / 


T^/-*c?oil^1\/ pcc pti 1" i q 1 

r^Uo&iuiy Coociiiidi 


SCOOoV 




smorio jz 




SCO 104 


1 C\A 

104 


smoriou i 


T^r*cciV\l"\/ pccpnti n 1 
rUoolUiy CooCiiiiicii 


SCOlOo 


lOo 


smoriozo 




SC0111 


111 
111 


smoriOH-u 


"D/-VC7C1 V\l"\/' PCC PT"l t \ Ql 


SC0184 


lo4 


smorn i / 


T^r\c?ciV^1"\7 PCCPTltl C\ 1 
r UdoIUIj CoodlllcU. 




190 


smorfl 36 


Possibly essential 


SC0329 


329 


smorf330 


Possibly essential 


SC0334 


334 


smorf335 


Possibly essential 


SC0654 


654 


smorf520 


Possibly essential 


SC0572 


572 


smorf639 


Possibly essential 


SC0562 


562 


smorf623 


Possibly essential 



Three smORFs were determined to be essential (SEQ ID NO: 13, 43 
and 44). Sixteen other sequences, which are listed in Table 4, were 
determined to encode possibly essential proteins. The remaining sequences of 

5 the 125 analyzed were determined as non-essential. The C. albicans 

presumptive homolog of smORF57 (orf6.5842) was also disrupted with the 
result that it is essential. In addition, sixteen S. cerevisiae smORFs are 
potential essential, but essentiality needs to be confirmed by gene disruption in 
the diploid strain followed by sporulation and tetrad analysis (SEQ ID NO: 34, 

10 47, 52, 60, 68, 89, 104, 108, 111, 184, 190, 329, 334, 654, 572, and 562). The 
remaining smORFs were non-essential (Table 4). 

IV. Pharmaceutical Compositions 

Once essential genes are identified, compounds and compositions can 

15 be screened for their ability to modulate the activity of the gene. For example, 
agents can be screen for C. albicans essential genes to determine whether the 
compound has antifungal properties. Essential genes of C. albicans, for 
example, that do not have plant and/or mammalian homologs can be used as 
targets for the design and discovery of highly specific antifungal agents. Also 

20 preferred would be the identification of essential fungal and bacterial genes 
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that have insect or plant homologs. Compounds and compositions that target 
such genes could be used as insecticides and herbicides. In another 
embodiment, essential genes which have mammalian homologs can be used as 
targets for the design of anti-proliferative agents or agents which inhibit 
proliferation or progression of the organism and/or its associated disease 
process. 

Candidate agents which can be used to screen and eventually to treat 
conditions and diseases associated with the organisms, such as C. albicans 
encompass numerous chemical classes, though typically they are organic 
molecules, preferably small organic molecules having a molecular weight of 
more than 100 and less than about 2,500 Daltons. Candidate agents are 
obtained from a wide variety of sources including libraries of synthetic or 
natural compounds. They can include peptides, macromolecules, small 
molecules, chemical and/or biological mixtures, and fungal, bacterial, or algal 
extracts. Such compounds, or molecules, may be biological, synthetic, 
organic, or even inorganic compounds, and may be obtained from several 
sources, including pharmaceutical companies and specialty suppliers of 
libraries (e.g., combinatorial libraries) of compounds. Libraries can also 
include peptide libraries. 

Methods of the present invention are well suited for screening libraries 
of compounds in multiwell plates (e.g., 96-, 384-, or higher density well 
plates), with a different test compound in each well. In particular, the methods 
may be employed with combinatorial libraries. A variety of combinatorial 
libraries of random-sequence oligonucleotides, polypeptides, or synthetic 
oligomers have been proposed. A number of small-molecule libraries have 
also been developed. 

Combinatorial libraries may be formed by a variety of solution-phase 
or solid-phase methods in which mixtures of different subunits are added step- 
wise to growing oligomers or parent compounds, until a desired compound is 
synthesized. A library of increasing complexity can be formed in this manner, 
for example, by pooling multiple choices of reagents with each additional 
subunit step. Methods of preparing combinatorial libraries the use of 
microwaving, dynamic combinatorial chemistry (DCC), solid phase organic 
synthesis (SPOS), and dual recursive deconvolution (DRED) as example. See, 
e.g., Borman, "Combinatorial Chemistry", Chem. Eng. News 49-58 (Aug. 27, 
2001). 
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The identity of library compounds with desired effects on the target 
protein can be determined by conventional means, such as iterative synthesis 
methods in which sublibraries containing known residues in one subunit 
position only are identified as containing active compounds. 

Preferred compounds may have characteristics of IC 50 values between 
about 15 and about 50 |xM; preferably a low mammalian cellular toxicity (e.g., 
GI 50 >100 |xM). In the example of C. albicans, preferable compounds will 
have antifungal activity of at least about 3-50 jxM against C. albicans, as well 
was other fungal agents associated with disease. Preferred antifungal agents 
will be those that are fungicidal, e.g., which cause the selective death of the 
fungus. Preferred antibiotics will cause the death of the fungal organism 
without detrimentally (e.g., causing cell death in the host organism infected by 
the fungus) affecting the condition of the host organism infected by the fungal 
organism. 

Generally, the preferred compositions and methods provided herein are 
directed at preventing and treating infections caused by but not limited to 
Chytridiomycetes, Hyphochrytridiomycetes, Plasmodiophoromycetes, 
Oomycetes, Zygomycetes, Ascomycetes, and Basidiomycetes. Fungal 
infections which can be inhibited or treated with compositions provided herein 
include but are not limited to: Candidiasis including but not limited to 
onchomycosis, chronic mucocutaneous candidiasis, oral candidiasis, 
epiglottistis, esophagitis, gastrointestinal infections, genitourinary infections, 
for example, caused by any Candida species, including but not limited to 
Candida albicans, Candida tropicalis, Candida (Torulopsis) glabrata, 
Candida parapsilosis, Candida lusitaneae, Candida rugosa and Candida 
pseudotropicalis; Aspergillosis including but not limited to granulocytopenia 
caused for example, by, Aspergillus spp. including but not limited to A. 
fumigatus, Aspergillus flavus, Aspergillus niger and Aspergillus terreuis; 
Zygomycosis, including but not limited to pulmonary, sinus and rhinocerebral 
infections caused by, for example, zygomycetes such as Mucor. Rhizopus spp., 
Absidia f Rhizomucor, Cuiningamella, Saksenaea, Basidobolus and 
Conidobolus; Cryptococcosis, including but not limited to infections of the 
central nervous system — meningitis and infections of the respiratory tract 
caused by, for example, Cryptococcus neoformans; Trichosporonosis caused 
by, for example, Trichosporon beigelii; Pseudallescheriasis caused by, for 
example, Pseudallescheria boydii; Fusarium infection caused by, for example, 
Fusarium such as Fusarium solani, Fusarium moniliforme and Fusarium 
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proliferatum; and other infections such as those caused by, for example, 
Penicillium spp. (generalized subcutaneous abscesses), Drechslera, Bipolaris. 
Exserohilum spp., Paecilomyces lilacinum, Exophila jeanselmei (cutaneous 
nodules), Malassezia furfur (folliculitis), Alternaria (cutaneous nodular 
lesions), Aureobasidium pullulans (splenic and disseminated infection), 
Rhodotorula spp. (disseminated infection), Chaetomium spp. (empyema), 
Torulopsis Candida (fungemia), Curvularia spp. (nasopharnygeal infection), 
Cunninghamella spp. (pneumonia), H. Capsulatum, B. dermatitidis, 
Coccidioides immitis, Sporothrix schenckii and Paracoccidioides brasiliensis, 
Geotrichum candidum (disseminated infection). 

Treating "fungal infections" as used herein refers to the treatment of 
conditions resulting from fungal infections. Therefore, contemplated is the 
treatment of, for example, pneumonia, nasopharnygeal infections, 
disseminated infections and other conditions listed above and known in the art 
by using the compositions provided herein. In preferred embodiments, 
treatments and sanitization of areas with the compositions provided herein can 
be used to treat immuno-compromised patients or areas where there are such 
patients. Wherein it is desired to identify the particular fungi resulting in the 
infection, techniques known in the art may be used. 

One of skill in the art will readily appreciate that the methods 
described herein also can be used for diagnostic applications. A diagnostic as 
used herein is a compound or method that assists in the identification and 
characterization of a health or disease state in humans or other animals, by a 
product of a gene identified by a disclosed method. The use of the genes and 
gene products thus identified are useful tools in vitro for fungal infection 
determination. 

V. Antisense Compositions and Use Thereof 

In another embodiment, antisense compounds, compositions and 
methods are provided for modulating the expression of genes identified by the 
above-described methods. Preferable antisense compounds are those which 
target nucleic acids identified using a systematic in silico discovery method 
disclosed herein. Preferred antisense compounds can target, for example, SEQ 
ID NOS: 1-119 (See Table 2). Of those, most preferred are agents that target 
essential genes such as smORF57 (SEQ ID NO: 13). 

It is preferred to target specific nucleic acids for antisense. "Targeting" 
an antisense compound to a particular nucleic acid would preferably be to a 
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nucleic acid that encodes a protein, wherein the nucleic acid is one identified 
by a systematic in silico process disclosed herein. The gene can be from a 
pathogenic organism. The targeting includes determination of a site or sites 
within the target gene for the antisense reaction (e.g., joinder of the sense and 
5 antisense strands to thereby modulate function of the gene or gene transcript). 
Preferred antisense compounds are those that recognize and bind with a site 
encompassing the translation initiation or termination codon of the open 
reading frame (ORF) of the gene. Since, as is known in the art, the translation 
initiation codon is typically 5'-AUG (in transcribed mRNA molecules; 5'-ATG 
10 in the corresponding DNA molecule), the translation initiation codon is also 
referred to as the "AUG codon," the "start codon" or the "AUG start codon". 
A minority of genes have a translation initiation codon having the RNA 
sequence 5'-GUG, 5 f -UUG or 5'-CUG, and 5'-AUA, 5'-ACG and 5'-CUG have 
been shown to function in vivo. Thus, the terms "translation initiation codon" 
15 and "start codon" can encompass many codon sequences, even though the 

initiator amino acid in each instance is typically methionine (in eukaryotes) or 
formylmethionine (in prokaryotes). 

It is also known in the art that eukaryotic and prokaryotic genes may 
have two or more alternative start codons, any one of which may be 
20 preferentially utilized for translation initiation in a particular cell type or 

tissue, or under a particular set of conditions. In the context of the invention, 
"start codon" and "translation initiation codon" refer to the codon or codons 
that are used in vivo to initiate translation of an mRNA molecule transcribed 
from a gene encoding a protein which was identified by a systematic in silico 
25 method disclosed herein or one of the sequences disclosed herein. 

A translation termination codon (or "stop codon") of a gene's transcript 
may have one of three sequences, i.e., 5'-UAA, 5'-UAG and 5'-UGA (the 
corresponding DNA sequences are 5-TAA, 5'-TAG and 5'-TGA, 
respectively). The terms "start codon region" and "translation initiation codon 
30 region" refer to a portion of such an mRNA or gene that encompasses from 
about 25 to about 50 contiguous nucleotides in either direction (i.e., 5' or 3 f ) 
from a translation initiation codon. Similarly, the terms "stop codon region" 
and "translation termination codon region" refer to a portion of such an mRNA 
or gene that encompasses from about 25 to about 50 contiguous nucleotides in 
35 either direction (i.e., 5' or 3') from a translation termination codon. Preferred 
antisense compositions would recognize and bind to areas containing a 
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termination codon and/or an initiation codon of any target gene or the mRNA 
transcript it encodes. 

The open reading frame (ORF) or "coding region," which is known in 
the art to refer to the region between the translation initiation codon and the 

5 translation termination codon, is also a region which may be preferred targets 
of the antisense compounds or compositions. Other target regions include the 
5 ? untranslated region (5TJTR), known in the art to refer to the portion of an 
mRNA in the 5 1 direction from the translation initiation codon, and thus 
including nucleotides between the 5' cap site and the translation initiation 

10 codon of an mRNA or corresponding nucleotides on the gene, and the 3' 
untranslated region (3'UTR), known in the art to refer to the portion of an 
mRNA in the 3 ? direction from the translation termination codon, and thus 
including nucleotides between the translation termination codon and 3 1 end of 
an mRNA or corresponding nucleotides on the gene. The 5' cap of an mRNA 

15 comprises an N7-methylated guanosine residue joined to the S'-most residue of 
the mRNA via a 5'- 5' triphosphate linkage. The 5' cap region of an mRNA is 
considered to include the 5' cap structure itself, and the first 50 nucleotides 
adjacent to the cap. The 5' cap region may also be a preferred target region for 
an antisense compound or composition. 

20 In the instance of more complex eukaryotic organisms, the genes are 

composed of introns and exons, with the exons containing the material that 
will encode the protein product of the gene. The intronic material, although 
transcribed from the gene to produce the mRNA, will be excised from the 
mRNA transcript prior to its translation into a protein. The exons are spliced 

25 together to form a continuous mRNA sequence. The mRNA splice sites, i.e., 
intron-exon junctions, may also be preferred target regions of antisense 
compounds and compositions, and are particularly useful in situations where 
aberrant splicing is implicated in disease, or where an overproduction of a 
particular mRNA splice product is implicated in disease. Aberrant fusion 

30 junctions due to rearrangements or deletions are also preferred targets. It has 
also been found that introns can also be effective, and therefore preferred, 
target regions for antisense compounds targeted, for example, to DNA or pre- 
mRNA. 

Once one or more target sites are identified in the genes identified 
35 using a systematic discovery process disclosed herein, oligonucleotides are 
chosen which are sufficiently complementary to the target, i.e., hybridize 
sufficiently well and with sufficient specificity, to result produce the desired 
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biological outcome (e.g., inhibition of microorganism proliferation or 
progression, inhibition and/or prevention of the disease or condition induced 
by the microorganism, modulation of the activity of the targeted gene). 

In the context of this invention, "hybridization" means hydrogen 
bonding, which may be Watson-Crick, Hoogsteen or reversed Hoogsteen 
hydrogen bonding, between complementary nucleoside or nucleotide bases. 
For example, adenine (A) and thymine (T) are complementary nucleobases, 
which pair through the formation of hydrogen bonds. "Complementary," as 
used herein, refers to the capacity for precise pairing between two nucleotides. 
For example, if a nucleotide at a certain position of an oligonucleotide is 
capable of hydrogen bonding with a nucleotide at the same position of a DNA 
or RNA molecule, then the oligonucleotide and the DNA or RNA are 
considered to be complementary to each other at that position. The 
oligonucleotide and the DNA or RNA are complementary to each other when 
a sufficient number of corresponding positions in each molecule are occupied 
by nucleotides which can hydrogen bond with each other. It is understood in 
the art that the sequence of an antisense compound need not be 100% 
complementary to that of its target nucleic acid to be specifically hybridizable. 
An antisense compound is specifically hybridizable when binding of the 
compound to the target DNA or RNA molecule interferes with the normal 
function of the target DNA or RNA to cause a loss of utility, and there is a 
sufficient degree of complementarity to avoid non-specific binding of the 
antisense compound or composition to non-target sequences under conditions 
in which specific binding is desired. Preferred conditions for specific binding 
are physiological conditions in the case of in vivo assays or therapeutic 
treatment, and in the case of in vitro assays, under conditions in which the 

assays are performed. 

Preferred antisense compounds and compositions contemplated would 
be for use as research reagents and diagnostics. For example, antisense 
oligonucleotides, which are able to inhibit gene expression, are often used by 
those of ordinary skill to elucidate the function of particular genes. Antisense 
compounds and compositions are also used, e.g., to distinguish between 
functions of various members of a biological pathway. Antisense modulation 
has, therefore, been harnessed for research use. 

Oligonucleotides have been employed as therapeutic moieties in the 
treatment of disease states in animals and man. It is thus established that 
oligonucleotides can be useful therapeutic modalities that can be configured to 
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be useful in treatment regimes for treatment of cells, tissues and animals, 
especially humans. In the context of this invention, the term "oligonucleotide" 
refers to an oligomer or polymer of ribonucleic acid (RNA) or 
deoxyribonucleic acid (DNA) or mimetics thereof. This term includes 

5 oligonucleotides composed of naturally occurring nucleobases, sugars and 
covalent internucleoside (backbone) linkages as well as oligonucleotides 
having non-naturally-occurring portions which function similarly. Such 
modified or substituted oligonucleotides are often preferred over native forms 
because of desirable properties such as, e.g., enhanced cellular uptake, 

10 enhanced affinity for nucleic acid target and increased stability in the presence 
of nucleases. 

While antisense oligonucleotides are a preferred form of antisense 
compound, the present invention comprehends other oligomeric antisense 
compounds, including but not limited to oligonucleotide mimetics such as are 
15 described below. The antisense compounds in accordance with this invention 
preferably comprise from about 8 to about 30 nucleobases (i.e., from about 8 
to about 30 linked nucleosides). The antisense compounds can be longer than 
30 (e.g., 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more as well 
as ranges in between). However, more preferred antisense compounds are 
20 comprise from about 12 to about 25 nucleobases. 

As is known in the art, a nucleoside is a base-sugar combination. The 
base portion of the nucleoside is normally a heterocyclic base. The two most 
common classes of such heterocyclic bases are the purines and the 
pyrimidines. Nucleotides are nucleosides that further include a phosphate 
25 group covalently linked to the sugar portion of the nucleoside. For those 

nucleosides that include a pentofuranosyl sugar, the phosphate group can be 
linked to either the 2', 3' or 5' hydroxyl moiety of the sugar. In forming 
oligonucleotides, the phosphate groups covalently link adjacent nucleosides to 
one another to form a linear polymeric compound. In turn, the respective ends 
30 of this linear polymeric structure can be further joined to form a circular 

structure. However, open linear structures are generally preferred for use as 
antisense compounds or in antisense compositions. Within the 
oligonucleotide structure, the phosphate groups are commonly referred to as 
forming the internucleoside backbone of the oligonucleotide. The normal 
35 linkage or backbone of RNA and DNA is a 3' to 5 ? phosphodiester linkage. 

Specific examples of preferred antisense compounds useful in this 
invention include oligonucleotides containing modified backbones or non- 
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natural internucleoside linkages. As defined in this specification, 
oligonucleotides having modified backbones include those that retain a 
phosphorus atom in the backbone and those that do not have a phosphorus 
atom in the backbone. For the purposes of this specification, and as 

5 sometimes referenced in the art, modified oligonucleotides that do not have a 
phosphorus atom in their internucleoside backbone can also be considered to 
be oligonucleosides. 

Preferred modified oligonucleotide backbones for use in antisense 
compounds and compositions include, for example, phosphorothioates, chiral 

10 phosphorothioates, phosphorodithioates, phosphotriesters, 

aminoalkylphosphotriesters, methyl and other alkyl phosphonates including 3'- 
alkylene phosphonates and chiral phosphonates, phosphinates, 
phosphoramidates including 3'-amino phosphoramidate and 
aminoalkylphosphoramidates, thionophosphoramidates, 

15 thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates 
having normal 3 , -5' linkages, 2'-5* linked analogs of these, and those having 
inverted polarity wherein the adjacent pairs of nucleoside units are linked 3-5' 
to 5-3' or 2-5' to 5'-2\ Various salts, mixed salts and free acid forms are also 
included. For additional deals in preparing such phosphorus containing 

20 linkages, see for example, U.S. Pat. Nos.: 3,687,808; 4,469,863; 4,476,301; 

5,023,243; 5,177,196; 5,188,897; 5,264,423; 5,276,019; 5,278,302; 5,286,717; 
5,321,131; 5,399,676; 5,405,939; 5,453,496; 5,455,233; 5,466,677; 5,476,925; 
5,519,126; 5,536,821; 5,541,306; 5,550,111; 5,563,253; 5,571,799; 5,587,361; 
and 5,625,050. 

25 Preferred modified oligonucleotide backbones that do not include a 

phosphorus atom may have backbones that are formed by short chain alkyl or 
cycloalkyl internucleoside linkages, mixed heteroatom and alkyl or cycloalkyl 
internucleoside linkages, or one or more short chain heteroatomic or 
heterocyclic internucleoside linkages. These include those having morpholino 

30 linkages (formed in part from the sugar portion of a nucleoside); siloxane 
backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and 
thioformacetyl backbones; methylene formacetyl and thioformacetyl 
backbones; alkene containing backbones; sulfamate backbones; 
methyleneimino and methylenehydrazino backbones; sulfonate and 

35 sulfonamide backbones; amide backbones; and others having mixed N, O, S 
and CH 2 component parts. For methods of preparing modified oligonucleotide 
backbones that lack phosphorous atoms, see, e.g., U.S. Pat. Nos.: 5,034,506; 
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5,166,315; 5,185,444; 5,214,134; 5,216,141; 5,235,033; 5,264,562; 5,264,564; 
5,405,938; 5,434,257; 5,466,677; 5,470,967; 5,489,677; 5,541,307; 5,561,225; 
5,596,086; 5,602,240; 5,610,289; 5,602,240; 5,608,046; 5,610,289; 5,618,704; 
5,623,070; 5,663,312; 5,633,360; 5,677,437; and 5,677,439. 

5 Other preferred oligonucleotide mimetics include replacement of both 

the sugar and the internucleoside linkage, i.e., the backbone, of the nucleotide 
units are replaced with novel groups. The base units are maintained for 
hybridization with an appropriate nucleic acid target compound. One such 
oligomeric compound, an oligonucleotide mimetic that has been shown to 

10 have excellent hybridization properties, is referred to as a peptide nucleic acid 
(PNA). In PNA compounds, the sugar-backbone of an oligonucleotide is 
replaced with an amide containing backbone, in particular an 
aminoethylglycine backbone. The nucleobases are retained and are bound 
directly or indirectly to aza nitrogen atoms of the amide portion of the 

15 backbone. For discussion of such methods, see for example, U.S. Pat. Nos. 
5,539,082; 5,714,331; and 5,719,262 and Nielsen et al., Science, 1991, 254: 
1497-1500. 

Most preferred embodiments of the invention are oligonucleotides with 
phosphorothioate backbones and oligonucleosides with heteroatom backbones, 
20 and in particular — CH 2 — NH— O— CH — , — CH 2 — N(CH 3 )— O— CH 2 — 
[known as a methylene (methylimino) or MMI backbone], — CH 2 — O — 
N(CH 3 )— CH 2 — , — CH— N(CH 3 )— N(CH 3 >— CH — and — O— N(CH 3 )— 
CH 2 — CH 2 — [wherein the native phosphodiester backbone is represented as 

O P — O — CH 2 — ] and amide backbones such as those described in U.S. 

25 Pat. No. 5,602,240. Also preferred are oligonucleotides having morpholino 
backbone structures, such as those described in U.S. Pat. No. 5,034,506. 

Modified oligonucleotides used as antisense compounds or in antisense 
compositions as contemplated herein may also contain one or more substituted 
sugar moieties. Preferred oligonucleotides comprise one of the following at 
30 the 2' position: —OH; F — ; O— , S— or N-alkyl; O— , S— , or N-alkenyl; 
O — , S — or N-alkynyl; or O-alkyl-O-alkyl, wherein the alkyl, alkenyl and 
alkynyl may be substituted or unsubstituted C, to C 10 alkyl or C 2 to C 10 alkenyl 
and alkynyl. Particularly preferred are 0[(CH 2 ) n 0] m CH 3 , 0(CH 2 ) n OCH 3 , 
0(CH 2 ) n NH 2 , 0(CH 2 ) n CH 3 , 0(CH 2 ) n ONH 2 , and 0(CH 2 ) n ON[(CH 2 ) n CH 3 )] 2 , 
35 where n and m are from 1 to about 10. Other preferred oligonucleotides may 
comprise one of the following at the 2' position: C, to C 10 lower alkyl, 
substituted lower alkyl, alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH 3 , 
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OCN, CI, Br, CN, CF 3 , OCF 3 , SOCH 3 , S0 2 CH 3 , ON0 2 , N0 2 , N 3 , NH 2 , 
heterocycloalkyl, heterocycloalkaryl, aminoalkylamino, polyalkylamino, 
substituted silyl, an RNA cleaving group, a reporter group, an intercalator, a 
group for improving the pharmacokinetic properties of an oligonucleotide, or a 

5 group for improving the pharmacodynamic properties of an oligonucleotide, 
and other substituents having similar properties. A preferred modification 
includes 2'-methoxyethoxy (2'-0-CH 2 - CH 2 -OCH 3 , also known as 2'-0-(2- 
methoxyethyl) or 2'-MOE) (Martin et aL, Helv. Chim. Acta, 1995, 78: 486- 
504), i.e., an alkoxyalkoxy group. Another preferred modification includes 2 f - 

10 dimethylaminooxyethoxy(i.e., a 0(CH 2 ) 2 ON(CH 3 ) 2 group, also known as 2 ? - 
DMAOE) and 2 f -dimethylaminoethoxyethoxy (also known in the art as 2 ? -0- 
dimethylaminoethoxyethyl or 2'-DMAEOE). 

Other preferred modifications to the antisense compounds 
contemplated include 2'-methoxy (2'-0— CH 3 ), 2 , -aminopropoxy (2 f - 

15 OCH 2 CH 2 CH 2 NH 2 ) and 2 f -fluoro (2 ? -F). Similar modifications may also be 

made at other positions on the oligonucleotide, particularly at the 3' position of 
the sugar on the 3 1 terminal nucleotide or in 2'-5' linked oligonucleotides and 
the 5 1 position of 5* terminal nucleotide. Oligonucleotides may also have sugar 
mimetics, such as cyclobutyl moieties in place of the pentofuranosyl sugar. 

20 For methods of preparing such modified sugar structures, see for example, 
U.S. Pat. Nos.: 4,981,957; 5,118,800; 5,319,080; 5,359,044; 5,393,878; 
5,446,137; 5,466,786; 5,514,785; 5,519,134; 5,567,811; 5,576,427; 5,591,722; 
5,597,909; 5,610,300; 5,627,053; 5,639,873; 5,646,265; 5,658,873; 5,670,633; 
and 5,700,920. 

25 Oligonucleotides may also include nucleobase (often referred to in the 

art simply as "base") modifications or substitutions. As used herein, 
"unmodified" or "natural" nucleobases include the purine bases adenine (A) 
and guanine (G), and the pyrimidine bases thymine (T), cytosine (C) and 
uracil (U). The invention also contemplates the use of modified nucleobases 

30 in the antisense compounds and compositions. Such modified nucleobases 
include other synthetic and natural nucleobases, such as 5-methylcytosine (5- 
me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 
6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and 
other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 

35 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl uracil and cytosine, 6- 
azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 
8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8-substituted adenines and 
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guanines, 5-halo (e.g., particularly 5-bromo, 5-trifluoromethyl) and other 5- 
substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 8- 
azaguanine and 8-azaadenine, 7-deazaguanine and 7-deazaadenine and 3- 
deazaguanine and 3-deazaadenine. Additional nucleobases would be known 
5 to the skilled artisan. See for example, U.S. Pat. No. 3,687,808; The Concise 
Encyclopedia Of Polymer Science And Engineering, 858-859 
(Kroschwitz, J. I., ed. John Wiley & Sons, 1990); Englisch et al, 
Angewandte Chemie, v.30, p. 613 (International Edition, 1991); and 
Sanghvi, Y. S., Chapter 15, Antisense Research and Applications, 289- 
10 302 (Crooke et al, CRC Press, 1993). Certain of these nucleobases are 
particularly useful for increasing the binding affinity of the oligomeric 
compounds of the invention. These include 5-substituted pyrimidines, 6- 
azapyrimidines and N-2, N-6 and 0-6 substituted purines, including 2- 
aminopropyladenine, 5-propynyluracil and 5-propynylcytosine. 5- 
15 methylcytosine substitutions have been shown to increase nucleic acid duplex 
stability by 0.6-1.2°C (Sanghvi, Y. S., et al, 1993) and are presently preferred 
base substitutions, even more particularly when combined with 2'-0- 
methoxyethyl sugar modifications. 

Another oligonucleotide modification contemplated for use in the 
20 antisense compounds and compositions involves chemically linking to the 

oligonucleotide one or more moieties or conjugates that enhance the activity, 
cellular distribution or cellular uptake of the oligonucleotide. Such moieties 
include but are not limited to lipid moieties such as a cholesterol moiety 
(Letsinger et al, Proc. Natl Acad. Sci. USA, 1989, 86: 6553-6), cholic acid 
25 (Manoharan et a/., Bioorg. Med. Chem. Lett., 1994, 4: 1053-60), a thioether, 
e.g., hexyl-S-tritylthiol (Manoharan et al. y Ann. N.Y. Acad. ScL, 1992, 660: 
306-9; and Manoharan et al 9 Bioorg. Med. Chem. Lett., 1993, 3: 2765-70), a 
thiocholesterol (Oberhauser et al., Nucl. Acids Res., 1992, 20: 533-8), an 
aliphatic chain, e.g., dodecandiol or undecyl residues (Saison-Behmoaras et 
30 al.,EMBOJ., 1991, 10: 1 1 1 1-8; Kabanov et al, FEB S Lett., 1990,259: 327- 
30; and Svinarchuk et al, Biochimie, 1993, 75: 49-54), a phospholipid, e.g., 
di-hexadecyl-rac-glycerol or triethyl-ammonium 1 ,2-di-O-hexadecyl-rac- 
glycero-3-H-phosphonate (Manoharan et al, Tetrahedron Lett., 1995, 36: 
3651-4; and Shea et al, Nucl Acids Res., 1990, 18: 3777-83), a polyamine or 
35 a polyethylene glycol chain (Manoharan et al , Nucleosides & Nucleotides, 
1995 5 14: 969-73), or adamantane acetic acid (Manoharan et al, Tetrahedron 
Lett., 1995, 36: 3651-4), a palmityl moiety (Mishra et al, Biochim. Biophys. 
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^4cta, 1995, 1264: 229-237), or an octadecylamine or hexylamino-carbonyl- 
oxycholesterol moiety (Crooke et aL, J. Pharmacol. Exp, Ther., 1996, 277: 
923-937). 

Methods for preparing such oligonucleotide conjugates would be 
5 known in the art and include but are not limited to U.S. Pat. Nos.: 4,828,979; 
4,948,882; 5,218,105; 5,525,465; 5,541,313; 5,545,730; 5,552,538; 5,578,717, 
5,580,731; 5,580,731; 5,591,584; 5,109,124; 5,118,802; 5,138,045; 5,414,077; 
5,486,603; 5,512,439; 5,578,718; 5,608,046; 4,587,044; 4,605,735; 4,667,025; 
4,762,779; 4,789,737; 4,824,941; 4,835,263; 4,876,335; 4,904,582; 4,958,013; 
10 5,082,830; 5,1 12,963; 5,214,136; 5,082,830; 5,1 12,963; 5,214,136; 5,245,022; 
5,254,469; 5,258,506; 5,262,536; 5,272,250; 5,292,873; 5,317,098; 5,371,241, 
5,391,723; 5,416,203, 5,451,463; 5,510,475; 5,512,667; 5,514,785; 5,565,552; 
5,567,810; 5,574,142; 5,585,481; 5,587,371; 5,595,726; 5,597,696; 5,599,923; 
5,599,928 and 5,688,941. 
15 One or more of the positions in a given compound can be modified. It 

is not necessary for all positions in a given compound to be uniformly 
modified, and in fact more than one of the aforementioned modifications may 
be incorporated in a single compound or even at a single nucleoside within an 
oligonucleotide. 

20 The present invention also includes antisense compounds that are 

chimeric compounds. ,, Chimeric f, antisense compounds or "chimeras," in the 
context of this invention, are antisense compounds, particularly 
oligonucleotides, which contain two or more chemically distinct regions, each 
made up of at least one monomer unit, i.e., a nucleotide in the case of an 

25 oligonucleotide compound. These oligonucleotides typically contain at least 
one region wherein the oligonucleotide is modified so as to confer upon the 
oligonucleotide increased resistance to nuclease degradation, increased cellular 
uptake, and/or increased binding affinity for the target nucleic acid. An 
additional region of the oligonucleotide may serve as a substrate for enzymes 

30 capable of cleaving RNA:DNA or RNA:RNA hybrids. By way of example, 
RNase H is a cellular endonuclease that cleaves the RNA strand of an 
RNA:DNA duplex. Activation of RNase H, therefore, results in cleavage of 
the RNA target, thereby greatly enhancing the efficiency of oligonucleotide 
inhibition of gene expression. Consequently, comparable results can often be 

35 obtained with shorter oligonucleotides when chimeric oligonucleotides are 

used, compared to phosphorothioate deoxyoligonucleotides hybridizing to the 
same target region. Cleavage of the RNA target can be routinely detected by 
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gel electrophoresis and, if necessary, associated nucleic acid hybridization 
techniques known in the art. 

Chimeric antisense compounds of the invention may be formed as 
composite structures of two or more oligonucleotides, modified 
5 oligonucleotides, oligonucleosides and/or oligonucleotide mimetics as 
described above. Such compounds have are also known as hybrids or 
gapmers. Methods of preparing such hybrids include but are not limited to the 
teachings of U.S. Pat. Nos.: 5,013,830; 5,149,797; 5,220,007; 5,256,775; 
5,366,878; 5,403,711; 5,491,133; 5,565,350; 5,623,065; 5,652,355; 5,652,356; 
10 and 5,700,922. 

The antisense compounds contemplated herein may be conveniently 
and routinely made through the well-known technique of solid phase 
synthesis. The oligonucleotides can be prepared for example using the 
equipment and techniques of Applied Biosystems. Any other means for such 
15 synthesis known in the art may additionally or alternatively be employed. 

The antisense compounds of the invention are synthesized in vitro and 
do not include antisense compositions of biological origin, or genetic vector 
constructs designed to direct the in vivo synthesis of antisense molecules. The 
compounds of the invention may also be admixed, encapsulated, conjugated or 
20 otherwise associated with other molecules, molecule structures or mixtures of 
compounds, as for example, liposomes, receptor targeted molecules, oral, 
rectal, topical or other formulations, for assisting in uptake, distribution and/or 
absorption. Methods and preparations for such uptake, distribution and/or 
absorption assisting formulations include, but are not limited to, U.S. Pat. 
25 Nos.: 5,108,921; 5,354,844; 5,416,016; 5,459,127; 5,521,291; 5,543,158; 

5,547,932; 5,583,020; 5,591,721; 4,426,330; 4,534,899; 5,013,556; 5,108,921; 
5,213,804; 5,227,170; 5,264,221; 5,356,633; 5,395,619; 5,416,016; 5,417,978; 
5,462,854; 5,469,854; 5,512,295; 5,527,528; 5,534,259; 5,543,152; 5,556,948; 
5,580,575; and 5,595,756. 
30 The contemplated antisense compounds and compositions disclosed 

herein also include any pharmaceutically acceptable salts, esters, or salts of 
such esters, or any other compound which, upon administration to an animal 
including a human, is capable of providing (directly or indirectly) the 
biologically active metabolite or residue thereof. Accordingly, for example, 
35 the disclosure is also drawn to prodrugs and pharmaceutically acceptable salts 
of the compounds of the invention, pharmaceutically acceptable salts of such 
prodrugs, and other bioequivalents. 
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The term "prodrug" indicates a therapeutic agent that is prepared in an 
inactive form that is converted to an active form (i.e., drug) within the body or 
cells thereof by the action of endogenous enzymes or other chemicals and/or 
conditions. In particular, prodrug versions of the oligonucleotides of the 
5 invention are prepared as SATE [(S-acetyl-2-thioethyl) phosphate] derivatives 
according to the methods disclosed for example in WO 93/24510 and in WO 
94/26764. 

The term "pharmaceutically acceptable salts" refers to physiologically 
and pharmaceutically acceptable salts of the compounds of the invention: i.e., 
10 salts that retain the desired biological activity of the parent compound and do 
not impart undesired toxicological effects thereto. The compounds for 
modulating any of the disclosed genes, gene transcripts or proteins encoded 
thereby include antisense compounds as well as other modulatory compounds. 
Pharmaceutically acceptable base addition salts for use with antisense 
15 as well as other modulatory compounds are formed with metals or amines, 
such as alkali and alkaline earth metals or organic amines. Examples of 
metals used as cations are sodium, potassium, magnesium, calcium, and the 
like. Examples of suitable amines are N^-dibenzylethylenediamine, 
chloroprocaine, choline, diethanolamine, dicyclohexylamine, ethylenediamine, 
20 N-methylglucamine, and procaine (see, e.g., Berge et al, "Pharmaceutical 
Salts," J. Pharma. ScL, 1977, 66: 1-19). The base addition salts of acidic 
compounds are prepared by contacting the free acid form with a sufficient 
amount of the desired base to produce the salt in the conventional manner. The 
free acid form may be regenerated by contacting the salt form with an acid, 
25 and isolating the free acid in a conventional manner. The free acid forms 

differ from their respective salt forms somewhat in certain physical properties 
such as solubility in polar solvents, but otherwise the salts are equivalent to 
their respective free acid for purposes of the present invention. As used 
herein, a "pharmaceutical addition salt" includes a pharmaceutically 
30 acceptable salt of an acid form of one of the components of the compositions 
of the invention. These include organic or inorganic acid salts of the amines. 
Preferred acid salts are the hydrochlorides, acetates, salicylates, nitrates and 
phosphates. Other suitable pharmaceutically acceptable salts are known in the 
art and include basic salts of a variety of inorganic and organic acids, such as, 
35 for example, with inorganic acids (e.g., hydrochloric acid, hydrobromic acid, 
sulfuric acid or phosphoric acid); with organic carboxylic, sulfonic, sulfo or 
phospho acids or N-substituted sulfamic acids, for example acetic acid, 
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propionic acid, glycolic acid, succinic acid, maleic acid, hydroxymaleic acid, 
methylmaleic acid, fumaric acid, malic acid, tartaric acid, lactic acid, oxalic 
acid, gluconic acid, glucaric acid, glucuronic acid, citric acid, benzoic acid, 
cinnamic acid, mandelic acid, salicylic acid, 4-aminosalicylic acid, 2- 

5 phenoxybenzoic acid, 2-acetoxybenzoic acid, embonic acid, nicotinic acid or 
isonicotinic acid; and with amino acids, such as the 20 alpha-amino acids 
involved in the synthesis of proteins in nature, for example glutamic acid or 
aspartic acid, and also with phenylacetic acid, methanesulfonic acid, 
ethanesulfonic acid, 2-hydroxyethanesulfonic acid, ethane- 1 ,2-disulfonic acid, 

10 benzenesulfonic acid, 4-methylbenzenesulfonic acid, naphthalene-2-sulfonic 
acid, naphthalene- 1, 5 -disulfonic acid, 2- or 3-phosphoglycerate, glucose-6- 
phosphate, N-cyclohexylsulfamic acid (with the formation of cyclamates), or 
with other acid organic compounds, such as ascorbic acid. 

Pharmaceutically acceptable salts of compounds may also be prepared 

15 with a pharmaceutically acceptable cation. Suitable pharmaceutically 

acceptable cations are well known in the art and include alkaline, alkaline 
earth, ammonium and quaternary ammonium cations. Carbonates or hydrogen 
carbonates are also possible. 

For oligonucleotides, preferred examples of pharmaceutically 

20 acceptable salts include but are not limited to (a) salts formed with cations 
such as sodium, potassium, ammonium, magnesium, calcium, polyamines 
such as spermine and spermidine, etc.; (b) acid addition salts formed with 
inorganic acids, for example hydrochloric acid, hydrobromic acid, sulfuric 
acid, phosphoric acid, nitric acid and the like; (c) salts formed with organic 

25 acids such as, for example, acetic acid, oxalic acid, tartaric acid, succinic acid, 
maleic acid, fumaric acid, gluconic acid, citric acid, malic acid, ascorbic acid, 
benzoic acid, tannic acid, palmitic acid, alginic acid, polyglutamic acid, 
naphthalenesulfonic acid, methanesulfonic acid, p-toluenesulfonic acid, 
naphthalenedisulfonic acid, polygalacturonic acid, and the like; and (d) salts 

30 formed from elemental anions such as chlorine, bromine, and iodine. 

The antisense compounds and other modulatory compounds described 
herein can be utilized in pharmaceutical compositions by adding an effective 
amount of an antisense compound or other modulatory compound to a suitable 
pharmaceutically acceptable diluent or carrier. Use of the compounds and 

35 methods of the invention may also be useful prophylactically, e.g., to prevent 
or delay infection, progression of the microorganism, or inflammation, for 
example. 
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The antisense compounds of the invention are useful for research and 
diagnostics, because these compounds hybridize to nucleic acids encoding a 
gene identified using the systematic discovery technique or an mRNA 
transcript thereof. Such hybridization allows the use of sandwich and other 

5 assays to easily be constructed to exploit this fact. Hybridization of the 
antisense oligonucleotides of the invention with a nucleic acid encoding a 
gene or gene transcript identified by a systematic discover method can be 
detected by means known in the art. Such means may include conjugation of 
an enzyme to the oligonucleotide, radiolabelling of the oligonucleotide or any 

10 other suitable detection means. Kits using such detection means for detecting 
the level of a transcript of a gene in a sample may also be prepared. 

The present invention also includes pharmaceutical compositions and 
formulations that include the antisense compounds and other modulatory 
compounds and compositions of the invention. The pharmaceutical 

15 compositions of the present invention may be administered in a number of 

ways depending upon whether local or systemic treatment is desired and upon 
the area to be treated. Administration may be topical (including ophthalmic 
and to mucous membranes including vaginal and rectal delivery), pulmonary 
(e.g., by inhalation or insufflation of powders or aerosols, including by 

20 nebulizer), intratracheal, intranasal, epidermal and transdermal, oral or 

parenteral. Parenteral administration includes intravenous (i.v.), intraarterial, 
subcutaneous (s.c), intraperitoneal (i.p.) or intramuscular (i.m.) injection or - 
infusion; or intracranial (e.g., intrathecal or intraventricular) administration. 
Oligonucleotides with at least one Z'-O-methoxyethyl modification are 

25 believed to be particularly useful for oral administration 

Pharmaceutical compositions and formulations for topical 
administration may include transdermal patches, ointments, lotions, creams, 
gels, drops, suppositories, sprays, liquids and powders. Conventional 
pharmaceutical carriers, aqueous, powder or oily bases, thickeners and the like 

30 may be necessary or desirable. Coated condoms, gloves and the like may also 
be useful. 

Compositions and formulations for oral administration include 
powders or granules, suspensions or solutions in water or non-aqueous media, 
capsules, sachets or tablets. Thickeners, flavoring agents, diluents, 
35 emulsifiers, dispersing aids or binders may be desirable. 

Compositions and formulations for parenteral, intrathecal or 
intraventricular administration may include sterile aqueous solutions that may 
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also contain buffers, diluents and other suitable additives such as, but not 
limited to, penetration enhancers, carrier compounds and other 
pharmaceutical^ acceptable carriers or excipients. 

Pharmaceutical compositions (e.g., gene, gene transcript or protein 
product modulatory agents as described herein) of the present invention 
include, but are not limited to, solutions, emulsions, and liposome-containing 
formulations. These compositions may be generated from a variety of 
components that include, but are not limited to, preformed liquids, self- 
emulsifying solids and self-emulsifying semisolids. 

The pharmaceutical formulations of the present invention, which may 
conveniently be presented in unit dosage form, may be prepared according to 
conventional techniques well known in the pharmaceutical industry. Such 
techniques include the step of bringing into association the active ingredients 
with the pharmaceutical carrier(s) or excipient(s). In general, the formulations 
are prepared by uniformly and intimately bringing into association the active 
ingredients with liquid carriers or finely divided solid carriers or both, and 
then, if necessary, shaping the product. 

The compositions of the present invention may be formulated into any 
of many possible dosage forms such as, but not limited to, tablets, capsules, 
liquid syrups, soft gels, suppositories, and enemas. The compositions of the 
present invention may also be formulated as suspensions in aqueous, non- 
aqueous or mixed media. Aqueous suspensions may further contain 
substances that increase the viscosity of the suspension including, for example, 
sodium carboxymethylcellulose, sorbitol and/or dextran. The suspension may 

also contain stabilizers. 

In one embodiment of the present invention, the pharmaceutical 
compositions may be formulated and used as foams. Pharmaceutical foams 
include formulations such as, but not limited to, emulsions, microemulsions, 
creams, jellies and liposomes. While basically similar in nature, these 
formulations vary in the components and the consistency of the final product. 
The preparation of such compositions and formulations is generally known to 
those skilled in the pharmaceutical and formulation arts and may be applied to 
the formulation of the compositions of the present invention. 

The compositions of the present invention may be prepared and 
formulated as emulsions. Emulsions are typically heterogenous systems of 
one liquid dispersed in another in the form of droplets usually exceeding 0. 1 
jam in diameter. See, e.g., Idson, in Pharmaceutical Dosage Forms v. 1, 
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p. 199 (Lieberman, Rieger and Banker (Eds.), 1988, Marcel Dekker, Inc., New 
York); Rosoff, in Pharmaceutical Dosage Forms, v. 1, p. 245; Block in 
Pharmaceutical Dosage Forms, v. 2, p. 335; Higuchi et al., in 
Remington's Pharmaceutical Sciences 301 (Mack Publishing Co., Easton, 

5 Pa., 1985). Emulsions are often biphasic systems comprising of two 

immiscible liquid phases intimately mixed and dispersed with each other. In 
general, emulsions may be either water-in-oil (w/o) or of the oil-in-water (o/w) 
variety. When an aqueous phase is finely divided into and dispersed as minute 
droplets into a bulk oily phase, the resulting composition is called a water-in- 

10 oil (w/o) emulsion. Alternatively, when an oily phase is finely divided into 
and dispersed as minute droplets into a bulk aqueous phase the resulting 
composition is called an oil-in-water (o/w) emulsion. Emulsions may contain 
additional components in addition to the dispersed phases and the active drug 
that may be present as a solution in either the aqueous phase, oily phase or 

15 itself as a separate phase. Pharmaceutical excipients such as emulsifiers, 
stabilizers, dyes, and anti-oxidants may also be present in emulsions as 
needed. Pharmaceutical emulsions may also be multiple emulsions that are 
comprised of more than two phases such as, for example, in the case of oil-in- 
water-in-oil (o/w/o) and water-in-oil-in-water (w/o/w) emulsions. Such 

20 complex formulations often provide certain advantages that simple binary 
emulsions do not. Multiple emulsions in which individual oil droplets of an 
o/w emulsion enclose small water droplets constitute a w/o/w emulsion. 
Likewise a system of oil droplets enclosed in globules of water stabilized in an 
oily continuous provides an o/w/o emulsion. 

25 Emulsions are characterized by little or no thermodynamic stability. 

Often, the dispersed or discontinuous phase of the emulsion is well dispersed 
into the external or continuous phase and maintained in this form through the 
means of emulsifiers or the viscosity of the formulation. Either of the phases 
of the emulsion may be a semisolid or a solid, as is the case of emulsion-style 

30 ointment bases and creams. Other means of stabilizing emulsions entail the 
use of emulsifiers that may be incorporated into either phase of the emulsion. 
Emulsifiers may broadly be classified into four categories: synthetic 
surfactants, naturally occurring emulsifiers, absorption bases, and finely 
dispersed solids (Idson, in Pharmaceutical Dosage Forms v. 1, p. 199 

35 (Lieberman, Rieger and Banker (Eds.), 1988, Marcel Dekker, Inc., New York). 

Synthetic surfactants, also known as surface active agents, have found 
wide applicability in the formulation of emulsions and have been reviewed in 
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the literature (Rieger, in Pharmaceutical Dosage Forms,v. 1, p. 285; Idson, 
in Pharmaceutical Dosage Forms, v. 1, p. 199). Surfactants are typically 
amphophilic and comprise a hydrophilic and a hydrophobic portion. The ratio 
of the hydrophilic to the hydrophobic nature of the surfactant has been termed 

5 the hydrophile/lipophile balance (HLB) and is a valuable tool in categorizing 
and selecting surfactants in the preparation of formulations. Surfactants may 
be classified into different classes based on the nature of the hydrophilic 
group: nonionic, anionic, cationic and amphoteric (Rieger, in 
Pharmaceutical Dosage Forms). 

10 Naturally occurring emulsifiers used in emulsion formulations include 

lanolin, beeswax, phosphatides, lecithin and acacia. Absorption bases possess 
hydrophilic properties such that they can soak up water to form w/o emulsions 
yet retain their semisolid consistencies, such as anhydrous lanolin and 
hydrophilic petrolatum. Finely divided solids have also been used as good 

15 emulsifiers, especially in combination with surfactants and in viscous 
preparations. These include polar inorganic solids, such as heavy metal 
hydroxides, non-swelling clays (e.g., bentonite, attapulgite, hectorite, kaolin, 
montmorillonite, colloidal aluminum silicate and colloidal magnesium 
aluminum silicate), pigments and nonpolar solids (e.g., carbon or glyceryl 

20 tristearate). 

A large variety of non-emulsifying materials are also included in 
emulsion formulations and contribute to the properties of emulsions. These 
include fats, oils, waxes, fatty acids, fatty alcohols, fatty esters, humectants, 
hydrophilic colloids, preservatives and antioxidants (Block, in 

25 Pharmaceutical Dosage Forms, v.l p. 385 (Lieberman, Rieger and Banker 
(Eds.), 1988, Marcel Dekker, Inc., New York)). 

Hydrophilic colloids or hydrocolloids include naturally occurring gums 
and synthetic polymers, such as polysaccharides (e.g., acacia, agar, alginic 
acid, carrageenan, guar gum, karaya gum, and tragacanth), cellulose 

30 derivatives (e.g., carboxymethylcellulose and carboxypropylcellulose), and 
synthetic polymers (e.g., carbomers, cellulose ethers, and carboxyvinyl 
polymers). These disperse or swell in water to form colloidal solutions that 
stabilize emulsions by forming strong interfacial films around the dispersed- 
phase droplets and by increasing the viscosity of the external phase. 

35 Since emulsions often contain a number of ingredients such as 

carbohydrates, proteins, sterols and phosphatides that may readily support the 
growth of microbes, these formulations often incorporate preservatives. 



48 



„U lJ r> ij 4l UI h'""t» "'JJJ IP"" »J,'"»J •!*"*» *"**Jl 

jl H..,a o ca „n;]-.„r,:si ^ m tut 01 n-ii :*3 >ui ett 



PATENT APPLICATION 
ATTY.DKT.NO.: 032796-090 

Commonly used preservatives included in emulsion formulations include 
methyl paraben, propyl paraben, quaternary ammonium salts, benzalkonium 
chloride, esters of p-hydroxybenzoic acid, and boric acid. Antioxidants are 
also commonly added to emulsion formulations to prevent deterioration of the 
5 formulation. Antioxidants used may be free radical scavengers (e.g., 
tocopherols, alkyl gallates, butylated hydroxyanisole, butylated 
hydroxytoluene) or reducing agents (e.g., ascorbic acid and sodium 
metabisulfite), and antioxidant synergists (e.g., citric acid, tartaric acid, and 
lecithin). 

10 The application of emulsion formulations via dermatological, oral and 

parenteral routes and methods for their manufacture have been reviewed in the 
literature (Idson, in Pharmaceutical Dosage Forms, v. 1, p. 199). 
Emulsion formulations for oral delivery have been very widely used because 
of reasons of ease of formulation, efficacy from an absorption and 

1 5 bioavailability standpoint. (Rosoff, in Pharmaceutical Dosage Forms, v. 1 , 
p. 245 (Lieberman, Rieger and Banker (Eds.), 1988, Marcel Dekker, Inc., New 
York); Idson, in Pharmaceutical Dosage Forms). Mineral-oil base 
laxatives, oil-soluble vitamins and high fat nutritive preparations are among 
the materials that have commonly been administered orally as o/w emulsions. 

20 In one embodiment of the present invention, the compositions of 

oligonucleotides and nucleic acids are formulated as microemulsions. A 
microemulsion may be defined as a system of water, oil and amphiphile which 
is a single optically isotropic and thermodynamically stable liquid solution 
(Rosoff, in Pharmaceutical Dosage Forms, v. 1, p. 245). Typically 

25 microemulsions are systems that are prepared by first dispersing an oil in an 
aqueous surfactant solution and then adding a sufficient amount of a fourth 
component, generally an intermediate chain-length alcohol to form a 
transparent system. Therefore, microemulsions have also been described as 
thermodynamically stable, isotropically clear dispersions of two immiscible 

30 liquids that are stabilized by interfacial films of surface- active molecules 
(Leung and Shah, in Controlled Release of Drugs: Polymers and 
Aggregate Systems, 185-215 (Rosoff, M., Ed., 1989, VCH Publishers, New 
York). Microemulsions commonly are prepared via a combination of three to 
five components that include oil, water, surfactant, cosurfactant and 

35 electrolyte. Whether the microemulsion is of the water-in-oil (w/o) or an oil- 
in-water (o/w) type is dependent on the properties of the oil and surfactant 
used and on the structure and geometric packing of the polar heads and 
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hydrocarbon tails of the surfactant molecules (Schott, in Remington's 
Pharmaceutical Sciences, 271 (Mack Publishing Co., Easton, Pa., 1985). 

Surfactants used in the preparation of microemulsions include, but are 
not limited to, ionic surfactants, non-ionic surfactants, Brij 96, 

5 polyoxyethylene oleyl ethers, polyglycerol fatty acid esters, tetraglycerol 
monolaurate (ML310), tetraglycerol monooleate (MO310), hexaglycerol 
monooleate (PO310), hexaglycerol pentaoleate (PO500), decaglycerol 
monocaprate (MCA750), decaglycerol monooleate (MO750), decaglycerol 
sequioleate (SO750), decaglycerol decaoleate (DAO750), alone or in 

10 combination with co-surfactants. The co-surfactant, usually a short-chain 
alcohol such as ethanol, 1 -propanol, and 1 -butanol, serves to increase the 
interfacial fluidity by penetrating into the surfactant film and consequently 
creating a disordered film because of the void space generated among 
surfactant molecules. 

15 Microemulsions may, however, be prepared without the use of co- 

surfactants and alcohol-free self-emulsifying microemulsion systems are 
known in the art. The aqueous phase may typically be, but is not limited to, 
water, an aqueous solution of the drug, glycerol, PEG300, PEG400, 
polyglycerols, propylene glycols, and derivatives of ethylene glycol. The oil 

20 phase may include, but is not limited to, materials such as Captex 300, Captex 
355, Capmul MCM, fatty acid esters, medium chain (C 8 -C 12 ) mono-, di-, and 
tri-glycerides, polyoxyethylated glyceryl fatty acid esters, fatty alcohols, 
polyglycolized glycerides, saturated polyglycolized C 8 -C 10 glycerides, 
vegetable oils and silicone oil. 

25 Microemulsions are particularly of interest from the standpoint of drug 

solubilization and the enhanced absorption of drugs. Lipid based 
microemulsions (both o/w and w/o) have been proposed to enhance the oral 
bioavailability of drugs, including peptides (Constantinides et al. 9 Pharm. 
Res., 1994, 11:1385-90; Ritschel,M^A. Find. Exp. Clin. Pharmacol., 1993, 

30 13: 205). Microemulsions afford advantages of improved drug solubilization, 
protection of drug from enzymatic hydrolysis, possible enhancement of drug 
absorption due to surfactant-induced alterations in membrane fluidity and 
permeability, ease of preparation, ease of oral administration over solid dosage 
forms, improved clinical potency, and decreased toxicity (Constantinides et 

35 al. y 1994; Ho et al.,J. Pharm. Sci., 1996, 85: 138-143). Often microemulsions 
may form spontaneously when their components are brought together at 
ambient temperature. This may be particularly advantageous when 
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formulating thermolabile drugs, peptides or oligonucleotides. Microemulsions 
have also been effective in the transdermal delivery of active components in 
both cosmetic and pharmaceutical applications. It is expected that the 
microemulsion compositions and formulations of the present invention will 
5 facilitate the increased systemic absorption of oligonucleotides and nucleic 

acids and other active agents from the gastrointestinal tract, as well as improve 
the local cellular uptake of oligonucleotides and nucleic acids and other active 
agents within the gastrointestinal tract, vagina, buccal cavity and other areas of 
administration. 

10 Microemulsions of the present invention may also contain additional 

components and additives such as sorbitan monostearate (Grill 3), Labrasol, 
and penetration enhancers to improve the properties of the formulation and to 
enhance the absorption of the oligonucleotides and nucleic acids of the present 
invention. Penetration enhancers used in the microemulsions of the present 

15 invention may be classified as belonging to one of five broad categories — 
surfactants, fatty acids, bile salts, chelating agents, and non-chelating non- 
surfactants (Lee et al. 9 Crit. Rev. Therap. Drug Carrier Systems, 1991, p. 92). 
Each of these classes has been discussed above. 

There are many organized surfactant structures besides microemulsions 

20 that have been studied and used for the formulation of drugs. These include 
monolayers, micelles, bilayers and vesicles. Vesicles, such as liposomes, are 
useful because of their specificity and the duration of action. As used in the 
present invention, the term "liposome" means a vesicle composed of 
amphiphilic lipids arranged in a spherical bilayer or bilayers. 

25 Liposomes are unilamellar or multilamellar vesicles which have a 

membrane formed from a lipophilic material and an aqueous interior. The 
aqueous portion contains the composition to be delivered. Cationic liposomes 
possess the advantage of being able to fuse to the cell wall. Non-cationic 
liposomes, although not able to fuse as efficiently with the cell wall, are taken 

30 up by macrophages in vivo. Selection of the appropriate liposome depending 
on the agent to be encapsulated would be evident given what is known in the 
art. 

In order to cross mammalian skin, lipid vesicles must pass through a 
series of fine pores, each with a diameter less than 50 nm, under the influence 
35 of a suitable transdermal gradient. Therefore, it is desirable to use a liposome 
that is highly deformable and able to pass through such fine pores. 
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Further advantages of liposomes include: (a) liposomes obtained from 
natural phospholipids are biocompatible and biodegradable; (b) liposomes can 
incorporate a wide range of water and lipid soluble drugs; (c) liposomes can 
protect encapsulated drugs in their internal compartments from metabolism 

5 and degradation (Rosoff, in Pharmaceutical Dosage Forms). Important 
considerations in the preparation of liposome formulations are the lipid surface 
charge, vesicle size and the aqueous volume of the liposomes. 

Liposomes are useful for the transfer and delivery of active ingredients 
to the site of action. Because the liposomal membrane is structurally similar 

10 to biological membranes, when liposomes are applied to a tissue, the 

liposomes start to merge with the cellular membranes. As the merging of the 
liposome and cell progresses, the liposomal contents are emptied into the cell 
where the active agent may act. 

Another embodiment also contemplates the use of liposomes for 

15 topical administration. Such advantages include reduced side-effects related 
to high systemic absorption of the administered drug, increased accumulation 
of the administered drug at the desired target, and the ability to administer a 
wide variety of drugs, both hydrophilic and hydrophobic, into the skin. 
Several reports have detailed the ability of liposomes to deliver agents 

20 including high-molecular weight DNA into the skin. Compounds including 

analgesics, antibodies, hormones and high-molecular weight DNAs have been 
administered to the skin. The majority of applications resulted in the targeting 
of the upper epidermis. 

Liposomes fall into two broad classes. Cationic liposomes are 

25 positively charged liposomes that interact with the negatively charged DNA 
molecules to form a stable complex. The positively charged DNA71iposome 
complex binds to the negatively charged cell surface and is internalized in an 
endosome. Due to the acidic pH within the endosome, the liposomes are 
ruptured, releasing their contents into the cell cytoplasm (Wang et al. 9 

30 Biochem. Biophys. Res, Comm., 1987, 147:, 980-5). 

Liposomes that are pH-sensitive or negatively-charged, entrap DNA 
rather than complex with it. Since both the DNA and the lipid are similarly 
charged, repulsion rather than complex formation occurs. Nevertheless, some 
DNA is entrapped within the aqueous interior of these liposomes. pH- 

35 sensitive liposomes have been used to deliver DNA encoding the thymidine 
kinase gene to cell monolayers in culture. Expression of the exogenous gene 



52 



•it) tJ"'h iP*'u w*" >i;"ii UP"' , ' I " ,J " ,rti > 

jl iL.» >ci« .« r «i sui o. iui r:« o 



PATENT APPLICATION 
ATTY. DKT. NO.: 032796-090 

was detected in the target cells (Zhou et al., J. Controlled Release, 1992, 19: 
269-74). 

Another contemplated liposomal composition includes phospholipids 
other than naturally-derived phosphatidylcholine. Neutral liposome 

5 compositions, for example, can be formed from dimyristoyl 

phosphatidylcholine (DMPC) or dipalmitoyl phosphatidylcholine (DPPC). 
Anionic liposome compositions generally are formed from dimyristoyl 
phosphatidylglycerol, while anionic fusogenic liposomes are formed primarily 
from dioleoyl phosphatidylethanolamine (DOPE). Another type of liposomal 

10 composition is formed from phosphatidylcholine (PC) such as, for example, 
soybean PC, and egg PC. Another type is formed from mixtures of 
phospholipid and/or phosphatidylcholine and/or cholesterol. 

"Sterically stabilized" liposomes that refer to liposomes comprising 
one or more specialized lipids that, when incorporated into liposomes, result in 

15 enhanced circulation lifetimes relative to liposomes lacking such specialized 
lipids are also contemplated. Examples of sterically stabilized liposomes are 
those in which part of the vesicle-forming lipid portion of the liposome (A) 
comprises one or more glycolipids, such as monosialoganglioside G M1 , or (B) 
is derivatized with one or more hydrophilic polymers, such as a polyethylene 

20 glycol (PEG) moiety. While not wishing to be bound by any particular theory, 
it is thought in the art that, at least for sterically stabilized liposomes 
containing gangliosides, sphingomyelin, or PEG-derivatized lipids, the 
enhanced circulation half-life of these sterically stabilized liposomes derives 
from a reduced uptake into cells of the reticuloendothelial system (RES) 

25 (Allen et al, FEBS Lett, 1987, 223: 42; Wu et al, Can. Res., 1993, 53: 3765). 

Many liposomes comprising lipids derivatized with one or more 
hydrophilic polymers, and methods of preparation thereof, are known in the 
art. See, e.g., Sunamoto et al (Bull. Chem. Soc. Jpn., 1980, 53: 2778) 
described liposomes comprising a nonionic detergent, 2C 12 15G, that contains 

30 a PEG moiety. Ilium et al (FEBS Lett., 1984, 167: 79) noted that hydrophilic 
coating of polystyrene particles with polymeric glycols results in significantly 
enhanced blood half-lives. Synthetic phospholipids modified by the 
attachment of carboxylic groups of polyalkylene glycols (e.g., PEG) are 
described by Sears (U.S. Pat. Nos. 4,426,330 and 4,534,899). Klibanov et al. 

35 (FEBS Lett., 1990, 268: 235) described experiments demonstrating that 

liposomes comprising phosphatidylethanolamine (PE) derivatized with PEG 
or PEG stearate have significant increases in blood circulation half-lives. 
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Blume et al (Biochimica et Biophysica Acta, 1990, 1029: 91) extended such 
observations to other PEG-derivatized phospholipids, e.g., DSPE-PEG, 
formed from the combination of distearoylphosphatidylethanolamine (DSPE) 
and PEG. Liposomes having covalently bound PEG moieties on their external 

5 surface are described in European Patent No. EP 0 445 13 1 Bl and WO 

90/04384 to Fisher. Liposome compositions containing 1-20 mole percent of 
PE derivatized with PEG, and methods of use thereof, are described by, e.g., 
Woodle et al. (U.S. Pat. Nos. 5,013,556 and 5,356,633) and Martin et al (U.S. 
Pat. No. 5,213,804 and European Patent No. EP 0 496 813 Bl). Liposomes 

10 comprising a number of other lipid-polymer conjugates are disclosed in WO 
91/05545 and U.S. Pat. No. 5,225,212 (both to Martin et al.) and in WO 
94/20073 (Zalipsky et al). Liposomes comprising PEG-modified ceramide 
lipids are described in WO 96/10391 (Choi et a/.). U.S. Pat. No. 5,540,935 
(Miyazaki et al.) and U.S. Pat. No. 5,556,948 (Tagawa et al.) describe PEG- 

15 containing liposomes that can be further derivatized with functional moieties 
on their surfaces. 

Methods of encapsulating nucleic acids in liposomes are also known in 
the art. See, WO 96/40062 to Thierry et al discloses methods for 
encapsulating high molecular weight nucleic acids in liposomes. U.S. Pat. No. 

20 5,264,221 to Tagawa et al discloses protein-bonded liposomes and asserts 
that the contents of such liposomes may include an antisense RNA. U.S. Pat. 
No. 5,665,710 to Rahman et al describes certain methods of encapsulating 
oligodeoxynucleotides in liposomes. 

Surfactants find wide application in formulations such as emulsions 

25 (including microemulsions) and liposomes. The most common way of 
classifying and ranking the properties of the many different types of 
surfactants, both natural and synthetic, is by the use of the 
hydrophile/lipophile balance (HLB). The nature of the hydrophilic group 
(also known as the "head") provides the most useful means for categorizing 

30 the different surfactants used in formulations (Rieger, in Pharmaceutical 
Dosage Forms, p.285 (Marcel Dekker, Inc., New York, N.Y., 1988, p. 285)). 

If the surfactant molecule is not ionized, it is classified as a nonionic 
surfactant. Nonionic surfactants find wide application in pharmaceutical and 
cosmetic products and are usable over a wide range of pH values. In general, 

35 their HLB values range from 2 to about 18 depending on their structure. 

Nonionic surfactants include nonionic esters such as ethylene glycol esters, 
propylene glycol esters, glyceryl esters, polyglyceryl esters, sorbitan esters, 
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sucrose esters, and ethoxylated esters. Nonionic alkanolamides and ethers 
such as fatty alcohol ethoxylates, propoxylated alcohols, and 
ethoxylated/propoxylated block polymers are also included in this class. The 
polyoxyethylene surfactants are the most popular members of the nonionic 
5 surfactant class. 

If the surfactant molecule carries a negative charge when it is dissolved 
or dispersed in water, the surfactant is classified as anionic. Anionic 
surfactants include carboxylates such as soaps, acyl lactylates, acyl amides of 
amino acids, esters of sulfuric acid such as alkyl sulfates and ethoxylated alkyl 

10 sulfates, sulfonates such as alkyl benzene sulfonates, acyl isethionates, acyl 

taurates and sulfosuccinates, and phosphates. The most important members of 
the anionic surfactant class are the alkyl sulfates and the soaps. 

If the surfactant molecule carries a positive charge when it is dissolved 
or dispersed in water, the surfactant is classified as cationic. Cationic 

15 surfactants include quaternary ammonium salts and ethoxylated amines. The 
quaternary ammonium salts are the most used members of this class. 

If the surfactant molecule has the ability to carry either a positive or 
negative charge, the surfactant is classified as amphoteric. Amphoteric 
surfactants include acrylic acid derivatives, substituted alkylamides, N- 

20 alkylbetaines and phosphatides. 

The use of surfactants in drug products, formulations and in emulsions 
has been reviewed (Rieger, in Pharmaceutical Dosage Forms, 285 (Marcel 
Dekker, Inc., New York, N.Y., 1988). 

In one embodiment, the present invention employs various penetration 

25 enhancers to affect the efficient delivery of nucleic acids and other agents, 

particularly oligonucleotides, to the skin of animals. Most drugs are present in 
solution in both ionized and nonionized forms. However, usually only lipid 
soluble or lipophilic drugs readily cross cell membranes. It has been 
discovered that even non-lipophilic drugs may cross cell membranes if the 

30 membrane to be crossed is treated with a penetration enhancer. In addition to 
aiding the diffusion of non-lipophilic drugs across cell membranes, penetration 
enhancers also enhance the permeability of lipophilic drugs. 

Penetration enhancers may be classified as belonging to one of five 
broad categories, i.e., surfactants, fatty acids, bile salts, chelating agents, and 

35 non-chelating non-surfactants (Lee et al., Critical Reviews in Therapeutic 
Drug Carrier Systems, 1991, p. 92). Each of the above mentioned classes of 
penetration enhancers are described below in greater detail. 
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Another embodiment of the invention contemplates pharmaceutical 
compositions comprising surfactants. Surfactants (or "surface-active agents") 
are chemical entities which, when dissolved in an aqueous solution, reduce the 
surface tension of the solution or the interfacial tension between the aqueous 
5 solution and another liquid, with the result that absorption of oligonucleotides 
through the mucosa is enhanced. In addition to bile salts and fatty acids, these 
penetration enhancers include, for example, sodium lauryl sulfate, 
polyoxyethylene-9-lauryl ether and polyoxyethylene-20-cetyl ether) (Lee et 
al, Crit. Rev. Therap. Drug Carrier Systems , 1991, 92); and 

10 perfluorochemical emulsions, such as FC-43 (Takahashi et aL, J. Pharm. 
Pharmacol, 1988, 40: 252). 

Another embodiment contemplates the use of various fatty acids and 
their derivatives to act as penetration enhancers include, for example, oleic 
acid, lauric acid, capric acid (n-decanoic acid), myristic acid, palmitic acid, 

15 stearic acid, linoleic acid, linolenic acid, dicaprate, tricaprate, monoolein (1- 
monooleoyl-rac-glycerol), dilaurin, caprylic acid, arachidonic acid, glycerol 1- 
monocaprate, l-dodecylazacycloheptan-2-one, acylcarnitines, acylcho lines, Cj. 
10 alkyl esters thereof (e.g., methyl, isopropyl and t-butyl), and mono- and di- 
glycerides thereof (i.e., oleate, laurate, caprate, myristate, palmitate, stearate, 

20 linoleate, and the like) (Lee et ah, 1991; Muranishi, Crit. Rev. Therap. Drug 
Carrier Systems, 1990, 7: 1-33; El Hariri et aL, J. Pharm. Pharmacol., 1992, 
44: 651-4). 

The compositions comprising the active agents of the invention may 
further comprise bile salts. The physiological role of bile includes the 

25 facilitation of dispersion and absorption of lipids and fat-soluble vitamins 
(Brunton, Chapter 38 in: Goodman & Gilman's The Pharmacological 
Basis of Therapeutics, 9th Ed., Hardman et al. Eds., McGraw-Hill, N.Y., 
1996, pp. 934-935). Various natural bile salts, and their synthetic derivatives, 
act as penetration enhancers. Thus, the term "bile salts" includes any of the 

30 naturally occurring components of bile as well as any of their synthetic 

derivatives. The bile salts of the invention include, for example, cholic acid 
(or its pharmaceutically acceptable sodium salt, sodium cholate), 
dehydrocholic acid (sodium dehydrocholate), deoxycholic acid (sodium 
deoxycholate), glucholic acid (sodium glucholate), glycholic acid (sodium 

35 glycocholate), glycodeoxycholic acid (sodium glycodeoxycholate), taurocholic 
acid (sodium taurocholate), taurodeoxycholic acid (sodium 
taurodeoxycholate), chenodeoxycholic acid (sodium chenodeoxycholate), 
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ursodeoxycholic acid (UDCA), sodium tauro-24,25-dihydro-fiisidate 
(STDHF), sodium glycodihydrofusidate and polyoxyethylene-9-lauryl ether 
(POE) (Lee et al, 1991; Swinyard, Chapter 39 In: Remington's 
Pharmaceutical Sciences, 18th Ed., Gennaro, ed., Mack Publishing Co., 
5 Easton, Pa., 1990, pages 782-783; Muranishi, 1990; Yamamoto et al, J. 

Pharm. Exp. Ther., 1992, 263: 25; Yamashita et al , J. Pharm. Sci., 1990, 79: 
579-83). 

The invention further contemplates compositions comprising chelating 
agents. Chelating agents can be defined as compounds that remove metallic 

10 ions from solution by forming complexes therewith, with the result that 

absorption of oligonucleotides through the mucosa is enhanced. With regards 
to their use as penetration enhancers for use when the active agent is an 
antisense agent, chelating agents have the added advantage of also serving as 
DNase inhibitors, as most characterized DNA nucleases require a divalent 

15 metal ion for catalysis and are thus inhibited by chelating agents (Jarrett, J. 
Chromatogr., 1993, 618: 315-39). Chelating agents of the invention include 
but are not limited to disodium ethylenediaminetetraacetate (EDTA), citric 
acid, salicylates (e.g., sodium salicylate, 5 -methoxy salicylate and 
homovanilate), N-acyl derivatives of collagen, laureth-9 and N-amino acyl 

20 derivatives of beta-diketones (enamines) (Lee et al., 1991; Muranishi, 1990; 
Buur etal., J. Control ReL, 1990, 14: 43-51). 

The invention also contemplates pharmaceutical compositions 
comprising active agents and non-chelating non-surfactants. Non-chelating 
non-surfactant penetration enhancing compounds can be defined as 

25 compounds that demonstrate insignificant activity as chelating agents or as 
surfactants, but that nonetheless enhance absorption of oligonucleotides 
through the alimentary mucosa (Muranishi, 1990). This class of penetration 
enhancers include, for example, unsaturated cyclic ureas, 1-alkyl- and 1- 
alkenylazacyclo-alkanone derivatives (Lee et al., 1991); and non-steroidal 

30 anti-inflammatory agents such as diclofenac sodium, indomethacin and 

phenylbutazone (Yamashita et al., J. Pharm. Pharmacol., 1987, 39: 621-6). 

For pharmaceutical compositions comprising oligonucleotides, agents 
that enhance uptake of oligonucleotides at the cellular level may also be added 
to the pharmaceutical and other compositions of the present invention. For 

35 example, cationic lipids, such as lipofectin (Junichi et al, U.S. Pat. No. 

5,705,188), cationic glycerol derivatives, and polycationic molecules, such as 



57 



..jl j:h "rai u . 10 ilj "3 jua ici 



PATENT APPLICATION 
ATTY.DKT.NO.: 032796-090 

polylysine (Lollo et al. 9 PCT Application WO 97/30731), are also known to 
enhance the cellular uptake of oligonucleotides. 

Other agents may be utilized to enhance the penetration of the 
administered nucleic acids, including glycols such as ethylene glycol and 

5 propylene glycol, pyrrols such as 2 -pyrrol, azones, and terpenes such as 
limonene and menthone. 

Certain compositions of the present invention also incorporate carrier 
compounds in the formulation. As used herein, "carrier compound" or 
"carrier" can refer to a nucleic acid, or analog thereof, which is inert (i.e., does 

10 not possess biological activity per se) but is recognized as a nucleic acid by in 
vivo processes that reduce the bioavailability of a nucleic acid having 
biological activity by, for example, degrading the biologically active nucleic 
acid or promoting its removal from circulation. The coadministration of a 
nucleic acid and a carrier compound, typically with an excess of the latter 

15 substance, can result in a substantial reduction of the amount of nucleic acid 
recovered in the liver, kidney or other extracirculatory reservoirs, presumably 
due to competition between the carrier compound and the nucleic acid for a 
common receptor. For example, the recovery of a partially phosphorothioate 
oligonucleotide in hepatic tissue can be reduced when it is coadministered 

20 with polyinosinic acid, dextran sulfate, polycytidic acid or 4-acetamido- 

44sothiocyano-stilbene-2,2'-disulfonic acid (Miyao et al. 9 Antisense Res. Dev., 
1995, 5: 115-121; Takakura et al t Antisense & Nucl Acid Drug Dev., 1996, 6: 
177-183). 

The pharmaceutical compositions disclosed herein may also comprise 
25 one or more pharmaceutically acceptable excipients. In contrast to carrier 
compounds described above, these excipients include a pharmaceutically 
acceptable solvent, suspending agent or any other pharmacologically inert 
vehicle for delivering one or more nucleic acids or other active agents to an 
animal. The excipient may be liquid or solid and is selected, with the planned 
30 manner of administration in mind, so as to provide for the desired bulk, 

consistency, etc., when combined with a nucleic acid or other active agent and 
the other components of a given pharmaceutical composition. Typical 
pharmaceutical carriers include, but are not limited to, binding agents (e.g., 
pregelatinized maize starch, polyvinylpyrrolidone or hydroxypropyl 
35 methylcellulose, etc.); fillers (e.g., lactose and other sugars, microcrystalline 
cellulose, pectin, gelatin, calcium sulfate, ethyl cellulose, polyacrylates or 
calcium hydrogen phosphate, etc.); lubricants (e.g., magnesium stearate, talc, 
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silica, colloidal silicon dioxide, stearic acid, metallic stearates, hydrogenated 
vegetable oils, corn starch, polyethylene glycols, sodium benzoate, sodium 
acetate, etc.); disintegrants (e.g., starch, sodium starch glycolate, etc.); and 
wetting agents (e.g., sodium lauryl sulphate, etc.). 
5 Pharmaceutically acceptable organic or inorganic excipients suitable 

for non-parenteral administration, which do not deleteriously react with 
nucleic acids, can also be used to formulate the compositions of the present 
invention. Suitable pharmaceutically acceptable carriers include, but are not 
limited to, water, salt solutions, alcohols, polyethylene glycols, gelatin, 

10 lactose, amylose, magnesium stearate, talc, silicic acid, viscous paraffin, 
hydroxymethylcellulose, polyvinylpyrrolidone and the like. 

Formulations for topical administration of nucleic acids and other 
contemplated active agents may include sterile and non-sterile aqueous 
solutions, non-aqueous solutions in common solvents such as alcohols, or 

15 solutions of the nucleic acids in liquid or solid oil bases. The solutions may 
also contain buffers, diluents and other suitable additives. Pharmaceutically 
acceptable organic or inorganic excipients suitable for non-parenteral 
administration that do not deleteriously react with nucleic acids or other 
contemplated active agents can be used. 

20 Suitable pharmaceutically acceptable excipients include, but are not 

limited to, water, salt solutions, alcohol, polyethylene glycols, gelatin, lactose, 
amylose, magnesium stearate, talc, silicic acid, viscous paraffin, 
hydroxymethylcellulose, polyvinylpyrrolidone and the like. 

The compositions of the present invention may additionally contain 

25 other adjunct components conventionally found in pharmaceutical 

compositions, at their art-established usage levels. Thus, for example, the 
compositions may contain additional, compatible, pharmaceutically-active 
materials such as, e.g., antipruritics, astringents, local anesthetics or anti- 
inflammatory agents, or may contain additional materials useful in physically 

30 formulating various dosage forms of the compositions of the present 
invention, such as dyes, flavoring agents, preservatives, antioxidants, 
opacifiers, thickening agents and stabilizers. However, such materials, when 
added, should not unduly interfere with the biological activities of the 
components of the compositions of the present invention. The formulations 

35 can be sterilized and, if desired, mixed with auxiliary agents, e.g., lubricants, 
preservatives, stabilizers, wetting agents, emulsifiers, salts for influencing 
osmotic pressure, buffers, colorings, flavorings and/or aromatic substances and 
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the like which do not deleteriously interact with the nucleic acid(s) of the 
formulation. 

Aqueous suspensions may contain substances that increase the 
viscosity of the suspension including, for example, sodium 

5 carboxymethylcellulose, sorbitol and/or dextran. The suspension may also 
contain stabilizers. 

Certain embodiments of the invention provide pharmaceutical 
compositions containing (a) one or more antisense compounds, and (b) one or 
more other chemotherapeutic agents which function by a non-antisense 

10 mechanism. Examples of such chemotherapeutic agents include, but are not 
limited to, anticancer drugs such as daunorubicin, dactinomycin, doxorubicin, 
bleomycin, mitomycin, nitrogen mustard, chlorambucil, melphalan, 
cyclophosphamide, 6-mercaptopurine, 6-thioguanine, cytarabine (CA), 5- 
fluorouracil (5-FU), floxuridine (5-FUdR), methotrexate (MTX), colchicine, 

15 vincristine, vinblastine, etoposide, teniposide, cisplatin and diethylstilbestrol 
(DES). See, generally, The Merck Manual of Diagnosis and Therapy, 
1206-28 (15th Ed., Berkow et al., eds., 1987, Rahway, N.J.). Anti- 
inflammatory drugs, including but not limited to nonsteroidal anti- 
inflammatory drugs and corticosteroids, and antiviral drugs, including but not 

20 limited to ribivirin, vidarabine, acyclovir and ganciclovir, may also be 
combined in compositions of the invention. See, generally, The Merck 
Manual of Diagnosis and Therapy, 2499-2506 and 46-49 (15th Ed., 
Berkow et al., eds., 1987, Rahway, N.J.) respectively. Other non-antisense 
chemotherapeutic agents are also within the scope of this invention. Two or 

25 more combined compounds may be used together or sequentially. 

In another related embodiment, compositions of the invention may 
contain one or more antisense compound or other active agents. Two or more 
combined compounds may be used together or sequentially. 

The formulation of therapeutic compositions and their subsequent 

30 administration is believed to be within the skill of those in the art. Dosing is 
dependent on severity and responsiveness of the disease state to be treated, 
with the course of treatment lasting from several days to several months, or 
until a cure is effected or a diminution of the disease state is achieved. 
Optimal dosing schedules can be calculated from measurements of drug 

35 accumulation in the body of the patient. Persons of ordinary skill can easily 
determine optimum dosages, dosing methodologies and repetition rates. 
Optimum dosages may vary depending on the relative potency of individual 
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oligonucleotides, and can generally be estimated based on ECs found to be 
effective in in vitro and in vivo animal models. In general, dosage is from 0.01 
jug to 100 g per kg of body weight, and may be given once or more daily, 
weekly, monthly or yearly, or even once every 2 to 20 years. Persons of 

5 ordinary skill in the art can easily estimate repetition rates for dosing based on 
measured residence times and concentrations of the drug in bodily fluids or 
tissues. Following successful treatment, it may be desirable to have the patient 
undergo maintenance therapy to prevent the recurrence of the disease state, 
wherein the oligonucleotide is administered in maintenance doses, ranging 

10 from 0.01 yug to 100 g per kg of body weight, once or more daily, to once 
every 20 years. 

VI. Polypeptide and Peptides 

The polypeptides or peptides of the invention are isolated polypeptides 

15 or peptides. Preferably these polypeptides are encoded by the smORF 

identified by the in silico process, but they can also be prepared synthetically 
or by a recombinant nucleic acid which would encode the same protein, but is 
different due to code degeneracy than the smORF sequence identified in silico. 
As used herein, with respect to peptides, the term "isolated peptides" 

20 and "isolated polypeptides" and "isolated protein" mean that the compounds 
are substantially pure and are essentially free of other substances with which 
they may be found in nature or in vivo systems to an extent practical and 
appropriate for their intended use. In particular, the compounds are 
sufficiently pure and are sufficiently free from other biological constituents of 

25 their hosts' cells so as to be useful in, for example, producing pharmaceutical 
preparations or sequencing. Because an isolated peptide (which as used herein 
also includes polypeptides and proteins) of the invention may be admixed with 
a pharmaceutically acceptable carrier in a pharmaceutical preparation, the 
peptide may comprise only a small percentage by weight of the preparation. 

30 The peptide is nonetheless substantially pure in that it has been substantially 
separated from the substances with which it may be associated in living 
systems. 

L 

The polypeptides and proteins of the invention can be used to prepare 
antibodies, to identify ligand binding partners, in competition assays, and the 
35 like as would be known in the art. These assays using fragments of the 

proteins may be based on motifs identified in the polypeptides, such as the 
representative examples shown in Table 3 (Motifs). 
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VIL Antibodies. Antibody Fragments and Immunologically Active 
Immunogens 

The invention also contemplates preparation and use of 
5 immunoglobulins against the proteins encoded by the smORFs. By 

immunoglobulins is meant to include antibodies, antibody fragments (e.g., 
Fab, Fab 1 , Fv, scFv, and F(ab) 2 ), bispecific antibodies, polyclonal and 
monoclonal antibodies, human and humanized antibodies, bivalent antibodies 
and antibody fragments and the like. 

10 

A. Humanized and Primatized® Antibodies 

The invention further provides humanized immunoglobulins (or 
antibodies). The humanized antibodies are preferably specific to the protein 
encoded by a specific smORF. These humanized and primatized® antibodies 

15 are useful as therapeutic and diagnostic reagents in their own right or can be 
combined to form a humanized or primatized® bispecific antibody possessing 
both of the binding specificities of its components. 

The humanized and primatized® forms of immunoglobulins have 
variable framework region(s) substantially from a human immunoglobulin 

20 (termed an acceptor immunoglobulin) and complementarity determining 

regions substantially from a mouse immunoglobulin (referred to as the donor 
immunoglobulin). The constant region(s), if present, are also substantially 
from a human immunoglobulin. The humanized antibodies exhibit a specific 
binding affinity for their respective antigens of at least 10 7 , 10 8 , 10 9 , or 10 10 M 

25 l . Often the upper and lower limits of binding affinity of the humanized 

antibodies are within a factor of three or five or ten of that of the mouse (or 
other animal) antibody from which they were derived. 

A "humanized monoclonal antibody" as used herein is a human 
monoclonal antibody or functionally active fragment thereof having human 

30 constant regions and a region that binds to a protein encoded by a smORF, 
wherein that region is from a mammal of a species other than a human. 
Humanized monoclonal antibodies may be made by any method known in the 
art. A "primatized® monoclonal antibody" would be one having a domain 
from a primate, such as a cynomolgus macaque. For example, see Anderson 

35 et al. 9 1997, Clin. Immunol. Immunopathol. 84: 73-84and U.S. Patent Nos. 
6,001,358 and 6,113,898. 
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Humanized monoclonal antibodies, for example, may be constructed 
by replacing the non-CDR regions of a non-human mammalian antibody with 
similar regions of human antibodies while retaining the epitopic specificity of 
the original antibody. For example, non-human CDRs and optionally some of 

5 the framework regions may be covalently joined to human FR and/or Fc/pFc' 
regions to produce a functional antibody. Certain corporations are now 
humanizing antibodies from specific murine antibody regions, e.g., Protein 
Design Labs (Mountain View Calif.). 

European Patent Application 0 239 400 provides an exemplary 

10 teaching of the production and use of humanized monoclonal antibodies in 
which at least the complementarity determining regions (CDR) portion of a 
murine (or other non-human mammal) antibody is included in the humanized 
antibody. Briefly, the following methods are useful for constructing a 
humanized CDR monoclonal antibody including at least a portion of a mouse 

15 CDR. A first replicable expression vector including a suitable promoter 

operably linked to a DNA sequence encoding at least a variable domain of an 
Ig heavy or light chain and the variable domain comprising framework regions 
from a human antibody and a CDR region of a murine antibody is prepared. 
Optionally a second replicable expression vector is prepared which includes a 

20 suitable promoter operably linked to a DNA sequence encoding at least the 
variable domain of a complementary human Ig light or heavy chain 
respectively. A cell line is then transformed with the vectors. Preferably the 
cell line is an immortalized mammalian cell line of lymphoid origin, such as a 
myeloma cell line, or is a normal lymphoid cell that has been immortalized by 

25 transformation with a virus. The transformed cell line is then cultured under 
conditions known to those of skill in the art to produce the humanized 
antibody. 

As set forth in European Patent Application 0 239 400, several 
techniques are well known in the art for creating the particular antibody 

30 domains to be inserted into the replicable vector. For example, the DNA 

sequence encoding the domain may be prepared by oligonucleotide synthesis. 
Alternatively a synthetic gene lacking the CDR regions in which four 
framework regions are fused together with suitable restriction sites at the 
junctions, such that double stranded synthetic or restricted subcloned CDR 

35 cassettes with sticky ends could be ligated at the junctions of the framework 
regions. Another method involves the preparation of the DNA sequence 
encoding the variable CDR containing domain by oligonucleotide site-directed 
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mutagenesis. Each of these methods is well known in the art. Therefore, 
those skilled in the art may construct humanized antibodies containing a 
murine CDR region without destroying the specificity of the antibody for its 
epitope. 

5 As noted above, such humanized antibodies may be produced in which 

some or all of the FR regions of deposited monoclonal antibody have been 
replaced by homologous human FR regions. In addition, the Fc portions may 
be replaced so as to produce IgA or IgM as well as human IgG antibodies 
bearing some or all of the CDRs of the deposited monoclonal antibody. In a 
10 more preferred embodiment, a murine CDR is grafted into the framework 

region of a human antibody to prepare the "humanized antibody. 1 ' See, e.g., L. 
Riechmann et al 9 1988, Nature 332: 323; M. S. Neuberger et al, 1985 Nature 
314: 268; and EPA 0 239 400 (published Sep. 30, 1987). 

In one embodiment of the invention, the peptide containing a region 
15 that binds to a polypeptide encoded by a smORF is a functionally active 
antibody fragment. Significantly, as is well known in the art, only a small 
portion of an antibody molecule, the paratope, is involved in the binding of the 
antibody to its epitope (see, in general, Clark, W. R. (1986) The 
Experimental Foundations of Modern Immunology Wiley & Sons, Inc., 
20 New York; Roitt, I. (1991) Essential Immunology, 7th Ed., Blackwell 

Scientific Publications, Oxford). The pFc ? and Fc regions of the antibody, for 
example, are effectors of the complement cascade but are not involved in 
antigen binding. An antibody from which the pFc' region has been 
enzymatically cleaved, or which has been produced without the pFc' region, 
25 designated an F(ab') 2 fragment, retains both of the antigen binding sites of an 
intact antibody. An isolated F(ab') 2 fragment is referred to as a bivalent 
monoclonal fragment because of its two antigen binding sites. Similarly, an 
antibody from which the Fc region has been enzymatically cleaved, or which 
has been produced without the Fc region, designated a Fab fragment, retains 
30 one of the antigen binding sites of an intact antibody molecule. Proceeding 

further, Fab fragments consist of a covalently bound antibody light chain and a 
portion of the antibody heavy chain denoted Fd (heavy chain variable region). 
The Fd fragments are the major determinant of antibody specificity (a single 
Fd fragment may be associated with up to ten different light chains without 
35 altering antibody specificity) and Fd fragments retain epitope-binding ability 
in isolation. Another preferred fragment is the scFv fragment. 
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fi) Mouse Antibodies for Humanization. The starting material for 
production of humanized antibody specific could be a protein or 
immunlogically active portion thereof encoded by SEQ ID NOS: 674-1346 or 
polypeptides identified by the disclosed in silico methods. 

5 (ii) Selection of Human Antibodies to Supply Framework Residues. 

The substitution of mouse CDRs into a human variable domain framework is 
most likely to result in retention of their correct spatial orientation if the 
human variable domain framework adopts the same or similar conformation to 
the mouse variable framework from which the CDRs originated. This is 

10 achieved by obtaining the human variable domains from human antibodies 
whose framework sequences exhibit a high degree of sequence identity with 
the murine variable framework domains from which the CDRs were derived. 
The heavy and light chain variable framework regions can be derived from the 
same or different human antibody sequences. The human antibody sequences 

15 can be the sequences of naturally occurring human antibodies or can be 
consensus sequences of several human antibodies. 

Suitable human antibody sequences are identified by computer 
comparisons of the amino acid sequences of the mouse variable regions with 
the sequences of known human antibodies. The comparison is performed 

20 separately for heavy and light chains but the principles are similar for each. 

(Hi) Computer Modeling. The unnatural juxtaposition of murine (or 
other animal) CDR regions with human variable framework region can result 
in unnatural conformational restraints, which, unless corrected by substitution 
of certain amino acid residues, lead to loss of binding affinity. The selection 

25 of amino acid residues for substitution is determined, in part, by computer 

modeling. Computer hardware and software for producing three-dimensional 
images of immunoglobulin molecules are widely available. In general, 
molecular models are produced starting from solved structures for 
immunoglobulin chains or domains thereof. The chains to be modeled are 

30 compared for amino acid sequence similarity with chains or domains of solved 
three-dimensional structures, and the chains or domains showing the greatest 
sequence similarity is/are selected as starting points for construction of the 
molecular model. The solved starting structures are modified to allow for 
differences between the actual amino acids in the immunoglobulin chains or 

35 domains being modeled, and those in the starting structure. The modified 

structures are then assembled into a composite immunoglobulin. Finally, the 
model is refined by energy minimization and by verifying that all atoms are 
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within appropriate distances from one another and that bond lengths and 
angles are within chemically acceptable limits. 

Computer modeling can also be utilized to identify the portions of a 
protein encoded by a smORF that has a good antigenic profile or 
5 hydrophobicity profile. This can be performed using algorithms set up by 
Chou-Fasman and the GOR method (Chou et al, 1978, Adv. Enzymol Relat. 
Areas Mol Biol 47: 45-147; and Gamier et al, 1978, J. Mol Biol 120: 97- 
120). The proteins can also be analyzed using various available computer 
algorithms to determine whether the potential antigenic region is buried within 
10 the protein or is exposed at the surface of the protein. See, e.g., David W. 
Mount, Bioinformatics: Sequence and Genome Analysis 381-478 (Cold 
Spring Harbor Laboratory Press, 2001). Alternatively, the antibodies and 
fragments thereof can be prepared to bind to domains identified by protein 
modeling, such as those of Table 3 (Motifs). 
15 (iv) Substitution of Amino Acid Residues. As noted supra, the 

humanized antibodies of the invention comprise variable framework region(s) 
substantially from a human immunoglobulin and complementarity 
determining regions substantially from a mouse immunoglobulin. Having 
identified the complementarity determining regions of mouse antibodies and 
20 appropriate human acceptor immunoglobulins, the next step is to determine 
which, if any, residues from these components should be substituted to 
optimize the properties of the resulting humanized antibody. In general, 
substitution of human amino acid residues with murine should be minimized, 
because introduction of murine residues increases the risk of the antibody 
25 eliciting a human anti-murine antibody (HAMA) response in humans. Amino 
acids are selected for substitution based on their possible influence on CDR 
conformation and/or binding to antigen. Investigation of such possible 
influences is by modeling, examination of the characteristics of the amino 
acids at particular locations, or empirical observation of the effects of 
30 substitution or mutagenesis of particular amino acids. 

When an amino acid differs between a mouse variable framework 
region and an equivalent human variable framework region, the human 
framework amino acid should usually be substituted by the equivalent mouse 
amino acid if it is reasonably expected that the amino acid: 
35 (1) noncovalently contacts antigen directly, or 

(2) is adjacent to a CDR region or otherwise interacts with a 
CDR region (e.g., is within about 4-6 A of a CDR region). 
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Other candidates for substitution are acceptor human framework amino acids 
that are unusual for a human immunoglobulin at that position. These amino 
acids can be substituted with amino acids from the equivalent position of more 
typical human immunoglobulins. Alternatively, amino acids from equivalent 
positions in the mouse antibody can be introduced into the human framework 
regions when such amino acids are typical of human immunoglobulin at the 
equivalent positions. 

In general, substitution of all or most of the amino acids fulfilling the 
above criteria is desirable. Occasionally, however, there is some ambiguity 
about whether a particular amino acid meets the above criteria, and alternative 
variant immunoglobulins are produced, one of which has that particular 
substitution, the other of which does not. 

Usually the CDR regions in humanized antibodies are substantially 
identical, and more usually, identical to the corresponding CDR regions in the 
mouse antibody from which they were derived. Although not usually 
desirable, it is sometimes possible to make one or more conservative amino 
acid substitutions of CDR residues without appreciably affecting the binding 
affinity of the resulting humanized immunoglobulin. Occasionally, 
substitutions of CDR regions can enhance binding affinity. 

Other than for the specific amino acid substitutions discussed above, 
the framework regions of humanized immunoglobulins are usually 
substantially identical, and more usually, identical to the framework regions of 
the human antibodies from which they were derived. Of course, many of the 
amino acids in the framework region make little or no direct contribution to 
the specificity or affinity of an antibody. Thus, many individual conservative 
substitutions of framework residues can be tolerated without appreciable 
change of the specificity or affinity of the resulting humanized 
immunoglobulin. 

(v) Production of Variable Regions. Having conceptually selected 
the CDR and framework components of humanized immunoglobulins, a 
variety of methods are available for producing such immunoglobulins. 
Because of the degeneracy of the code, a variety of nucleic acid sequences will 
encode each immunoglobulin amino acid sequence. The desired nucleic acid 
sequences can be produced by de novo solid-phase DNA synthesis or by PCR 
mutagenesis of an earlier prepared variant of the desired polynucleotide. All 
nucleic acids encoding the antibodies described in this application are 
expressly included in the invention. 
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(vi) Selection of Constant Region. The variable segments of 
humanized antibodies produced as described supra are typically linked to at 
least a portion of an immunoglobulin constant region (Fc), typically that of a 
human immunoglobulin. Human constant region DNA sequences can be 

5 isolated in accordance with well-known procedures from a variety of human 
cells, but preferably immortalized B-cells (see, e.g., WO87/02671). 
Ordinarily, the antibody will contain both light chain and heavy chain constant 
regions. The heavy chain constant region usually includes C H 1, hinge, C H 2, 
C H 3, and, sometimes, C H 4 regions. 

10 The humanized antibodies include antibodies having all types of 

constant regions, including IgM, IgG, IgD, IgA and IgE, and any isotype, 
including IgGl, IgG2, IgG3 and IgG4. When it is desired that the humanized 
antibody exhibit cytotoxic activity, the constant domain is usually a 
complement-fixing constant domain and the class is typically IgGl. When 

15 such cytotoxic activity is not desirable, the constant domain may be of the 

IgG2 class. The humanized antibody may comprise sequences from more than 
one class or isotype. 

(vii) Expression Systems. Nucleic acids encoding humanized light and 
heavy chain variable regions, optionally linked to constant regions, are 

20 inserted into expression vectors. The light and heavy chains can be cloned in 
the same or different expression vectors. The DNA segments encoding 
immunoglobulin chains are operably linked to control sequences in the 
expression vector(s) that ensure the expression of immunoglobulin 
polypeptides. Such control sequences include a signal sequence, a promoter, 

25 an enhancer, and a transcription termination sequence (see Queen et al. t 1989, 
Proc. Natl. Acad. Sci. USA 86: 10029; WO 90/07861; Co et al, 1992, J. 
Immunol. 148: 1149). 

B. Fragments of Humanized Antibodies 

30 The humanized antibodies of the invention include fragments as well 

as intact antibodies. Typically, these fragments compete with the intact 
antibody from which they were derived for antigen binding. The fragments 
typically bind with an affinity of at least 10 7 M" 1 , and more typically 10 8 or 10 9 
M 1 (i.e., within the same ranges as the intact antibody). Humanized antibody 

35 fragments include separate heavy chains, light chains Fab, Fab', F(ab f ) 2 , Fv, 
and scFv. Fragments are produced by recombinant DNA techniques, or by 
enzymatic or chemical separation of intact immunoglobulins. 
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C. Recombinant Bispecific Antibodies 

The methods discussed above for forming bispecific antibodies from 
antibodies produced by hybridoma cells can also be applied or adapted to 
5 production of bispecific antibodies from recombinantly expressed antibodies. 
For example, bispecific antibodies can be produced by fusion of two cell lines 
respectively expressing the component antibodies. Alternatively, the 
component antibodies can be co-expressed in the same cell line. Bispecific 
antibodies can also be formed by chemical cross-linking of component 

10 recombinant antibodies. 

Component recombinant antibodies can also be linked genetically. In 
one approach, a bispecific antibody is expressed as a single fusion protein 
comprising the four different variable domains from the two component 
antibodies separated by spacers. For example, such a protein might comprise 

15 from one terminus to the other, the V L region of the first component antibody, 
a spacer, the V H domain of the first component antibody, a second spacer, the 
V H domain of the second component antibody, a third spacer, and the V L 
domain of the second component antibody. See, e.g., Segal et al, 1992 
Biologic Therapy of Cancer Updates 2: 1-12. 

20 In a further approach, bispecific antibodies are formed by linking 

component antibodies to leucine zipper peptides. See generally Kostelny et 
al., 1992, J. Immunol. 148: 1547-1553. Leucine zippers have the general 
structural formula (Leucine-X, -X 2 -X 3 -X 4 -X 5 -X 6 ) n , where X may be any of 
the conventional 20 amino acids (Proteins, Structures and Molecular 

25 Principles, (1984) Creighton (ed.), W. H. Freeman and Company, New 
York), but are most likely to be amino acids with high a-helix forming 
potential. For example, alanine, valine, aspartic acid, glutamic acid, and 
lysine (Richardson et al., 1988, Science 240: 1648), and n may be 3 or greater, 
although typically n is 4 or 5. 

30 In the formation of bispecific antibodies, binding fragments of the 

component antibodies are fused in-frame to first and second leucine zippers. 
Suitable binding fragments including Fv, Fab, Fab', or the heavy chain. The 
zippers can be linked to the heavy or light chain of the antibody binding 
fragment and are usually linked to the C-terminal end. If a constant region or 

35 a portion of a constant region is present, the leucine zipper is preferably linked 
to the constant region or portion thereof. For example, in a Fab'-leucine zipper 
fusion, the zipper is usually fused to the C-terminal end of the hinge. The 
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inclusion of leucine zippers fused to the respective component antibody 
fragments promotes formation of heterodimeric fragments by annealing of the 
zippers. When the component antibodies include portions of constant regions 
(e.g., Fab f fragments), the annealing of zippers also serves to bring the 

5 constant regions into proximity, thereby promoting bonding of constant 

regions (e.g., in a F(ab f ) 2 fragment). Typical human constant regions bond by 
the formation of two disulfide bonds between hinge regions of the respective 
chains. This bonding can be strengthened by engineering additional cysteine 
residue(s) into the respective hinge regions, which allows formation of 

10 additional disulfide bonds. 

Leucine zippers linked to antibody binding fragments can be produced 
in various ways. For example, polynucleotide sequences encoding a fusion 
protein comprising a leucine zipper can be expressed by a cellular host or by 
using an in vitro translation system. Alternatively, leucine zippers and/or 

15 antibody binding fragments can be produced separately, either by chemical 
peptide synthesis, by expression of polynucleotide sequences encoding the 
desired polypeptides, or by cleavage from other proteins containing leucine 
zippers, antibodies, or macromolecular species, and subsequent purification. 
Such purified polypeptides can be linked by peptide bonds, with or without 

20 intervening spacer amino acid sequences, or by non-peptide covalent bonds, 
with or without intervening spacer molecules, the spacer molecules being 
either amino acids or other non-amino acid chemical structures. Regardless of 
the method or type of linkage, such linkage can be reversible. For example, a 
chemically labile bond, either peptidyl or otherwise, can be cleaved 

25 spontaneously or upon treatment with heat, electromagnetic radiation, 

proteases, or chemical agents. Two examples of such reversible linkage are: 
(1) a linkage comprising an Asn-Gly peptide bond which can be cleaved by 
hydroxylamine, and (2) a disulfide bond linkage which can be cleaved by 
reducing agents. 

30 Component antibody fragment- leucine zippers fusion proteins can be 

annealed by co-expressing both fusion proteins in the same cell line. 
Alternatively, the fusion proteins can be expressed in separate cell lines and 
mixed in vitro. If the component antibody fragments include portions of a 
constant region (e.g., Fab 1 fragments), the leucine zippers can be cleaved after 

35 annealing has occurred. The component antibodies remain linked in the 
bispecific antibody via the constant regions. 
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As used herein the term "functionally active antibody fragment" means 
a fragment of an antibody molecule including a region that binds to a protein 
or fragment thereof encoded by a smORF, wherein the antibody fragment 
retains the T-cell stimulating functionality of an intact antibody having the 

5 same specificity such as the deposited monoclonal antibodies. Such fragments 
are also well known in the art and are regularly employed both in vitro and in 
vivo. In particular, well-known functionally active antibody fragments include 
but are not limited to F(ab f ) 2 , Fab, Fv, scFv and Fd fragments of antibodies. 
These fragments that lack the Fc fragment of intact antibody, clear more 

10 rapidly from the circulation, and may have less non-specific tissue binding 
than an intact antibody. For example, single-chain antibodies can be 
constructed in accordance with the methods described in U.S. Pat. No. 
4,946,778 to Ladner et al. Such single-chain antibodies include the variable 
regions of the light and heavy chains joined by a flexible linker moiety. 

15 Methods for obtaining a single domain antibody ("Fd") which comprises an 
isolated variable heavy chain single domain, also have been reported (see, for 
example, Ward et al, 1989, Nature 341 : 644-646, disclosing a method of 
screening to identify an antibody heavy chain variable region (V H single 
domain antibody) with sufficient affinity for its target epitope to bind thereto 

20 in isolated form). Methods for making recombinant Fv fragments based on 
known antibody heavy chain and light chain variable region sequences are 
known in the art and have been described, e.g., U.S. Pat. No. 4,462,334. Other 
references describing the use and generation of antibody fragments include 
e.g., Fab fragments (Tijssen, Practice and Theory of Enzyme 

25 Immunoassays (Elsevieer, Amsterdam, 1985)), Fv fragments (Hochman et 
al. 9 1973 Biochemistry 12: 1130; Sharon etaL, 1976 Biochemistry 15: 1591; 
Ehrilch et al., U.S. Pat. No. 4,355,023) and portions of antibody molecules 
(e.g., Audilore-Hargreaves, U.S. Pat. No. 4,470,925). 

Functionally active antibody fragments also encompass "humanized 

30 antibody fragments." As one skilled in the art will recognize, such fragments 
could be prepared by traditional enzymatic cleavage of intact humanized 
antibodies. If, however, intact antibodies are not susceptible to such cleavage, 
because of the nature of the construction involved, the noted constructions can 
be prepared with immunoglobulin fragments used as the starting materials; or, 

35 if recombinant techniques are used, the DNA sequences, themselves, can be 
tailored to encode the desired "fragment" which, when expressed, can be 
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combined in vivo or in vitro, by chemical or biological means, to prepare the 
final desired intact immunoglobulin fragment. 

Smaller antibody fragments and small binding polypeptides having 
binding specificity are also contemplated. Several routine assays may be used 
5 to easily identify such peptides. Screening assays for identifying peptides of 
the invention are performed for example, using phage display procedures such 
as those described in Hart et al, 1994, J, Biol Chem. 269: 12468. In general, 
phage display libraries using, e.g., Ml 3 or fd phage, are prepared using 
conventional procedures such as those described in the foregoing reference. 

10 The libraries display inserts containing from 4 to 80 amino acid residues. The 
inserts optionally represent a completely degenerate or a biased array of 
peptides. Ligands that bind selectively to a smORF polypeptide are obtained 
by selecting those phages, which express on their surface a ligand that binds to 
the smORF polypeptide. These phages then are subjected to several cycles of 

15 reselection to identify the peptide ligand-expressing phages that have the most 
useful binding characteristics. Typically, phages that exhibit the best binding 
characteristics (e.g M highest affinity) are further characterized by nucleic acid 
analysis to identify the particular amino acid sequences of the peptides 
expressed on the phage surface and the optimum length of the expressed 

20 peptide to achieve optimum binding to the protein or polypeptide fragment 
encoded by a smORF. Alternatively, such peptide ligands can be selected 
from combinatorial libraries of peptides containing one or more amino acids. 
Such libraries can further be synthesized which contain non-peptide synthetic 
moieties, which are less subject to enzymatic degradation compared to their 

25 naturally occurring counterparts. 

Additionally small polypeptides including those containing the smORF 
polypeptide binding CDR3 region may easily be synthesized or produced by 
recombinant means to produce the peptide of the invention. Such methods are 
well known to those of ordinary skill in the art. Peptides can be synthesized 

30 for example, using automated peptide synthesizers, which are commercially 
available. The peptides can be produced by recombinant techniques by 
incorporating the DNA expressing the peptide into an expression vector and 
transforming cells with the expression vector to produce the peptide. 

The sequence of the CDR regions, for use in synthesizing the peptides 

35 of the invention, may be determined by methods known in the art. The heavy 
chain variable region is a peptide, which generally ranges from 100 to 150 
amino acids in length (or any number in between). The light chain variable 



72 



»J n 'k lf»»J» »""{J *" »}t 

. S1...P 4Jj 



■*"H 
I' 



w *U1 U...H O 



' !! 4i 



PATENT APPLICATION 
ATTY.DKT.NO.: 032796-090 

region is a peptide, which generally ranges from 80 to 130 amino acids in 
length (or any number in between). The CDR sequences within the heavy and 
light chain variable regions, which include only approximately 3-25 amino 
acid sequences (including any number in between), may easily be sequenced 

5 by one of ordinary skill in the art. The peptides may even be synthesized 
synthetically by commercial sources such as by the Scripps Protein and 
Nucleic Acids Core Sequencing Facility (La Jolla Calif.). 

To determine whether a peptide binds to a smORF polypeptide, any 
known binding assay may be employed. For example, the peptide may be 

10 immobilized on a surface and then contacted with a labeled smORF 

polypeptide. The amount of smORF polypeptide that interacts with the 
peptide or the amount that does not bind to the peptide may then be 
quantitated to determine whether the peptide binds to the smORF polypeptide. 
A surface having the deposited monoclonal antibody immobilized thereto may 

15 serve as a positive control. 

Screening of peptides of the invention, also can be carried out utilizing 
a competition assay. If the peptide being tested competes with the deposited 
monoclonal antibody, as shown by a decrease in binding of the deposited 
monoclonal antibody, then it is likely that the peptide and the deposited 

20 monoclonal antibody bind to the same, or a closely related, epitope. Still 

another way to determine whether a peptide has the specificity of, for example 
a monoclonal antibody, is to pre-incubate the deposited monoclonal antibody 
with the smORF polypeptide with which it is normally reactive, and then add 
the peptide being tested to determine if the peptide being tested is inhibited in 

25 its ability to bind to the smORF polypeptide. If the peptide being tested is 
inhibited then, in all likelihood, it has the same, or a functionally equivalent, 
epitope and specificity as the deposited monoclonal antibody. Other methods 
and assays would be evident to the artisan of ordinary skill. 

30 D. Therapeutic Methods 

Pharmaceutical compositions comprising bispecific antibodies of the 
present invention are useful for parenteral administration, i.e., subcutaneously 
(s.c), intramuscularly (I.M.) and particularly, intravenously (I. V.). Other 
contemplated forms of administration, depending on the particular need, 

35 would be oral, intrathecal, and intraperitoneal. The compositions for 

parenteral administration commonly comprise a solution of the antibody or a 
cocktail thereof dissolved in an acceptable carrier, preferably an aqueous 
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carrier. A variety of aqueous carriers can be used, e.g., water, buffered water, 
0.4% saline, 0.3% glycine and the like. These solutions are sterile and 
generally free of particulate matter. The compositions may contain 
pharmaceutical^ acceptable auxiliary substances as required to approximate 

5 physiological conditions such as pH adjusting and buffering agents, toxicity 
adjusting agents and the like, for example sodium acetate, sodium chloride, 
potassium chloride, calcium chloride, sodium lactate. The concentration of the 
bispecific antibodies in these formulations can vary widely, i.e., from less than 
about 0.01%, usually at least about 0.1% to as much as 5% by weight and will 

10 be selected primarily based on fluid volumes, and viscosities in accordance 
with the particular mode of administration selected. 

A typical antibody or antibody fragment composition for intravenous 
infusion can be made up to contain, for example, 250 ml of sterile Ringer's 
solution, and 10 mg of bispecific antibody. See Remington's 

15 Pharmaceutical Science (15th Ed., Mack Publishing Company, Easton, 
Pa., 1980). 

The compositions containing the antibodies or antibody cocktails or a 
cocktail thereof can be administered for prophylactic and/or therapeutic 
treatments. In therapeutic application, compositions are administered to a 

20 subject with a fungal infection, which expresses a smORF polypeptide of ■ 
interest. The amount administered to the patient is sufficient to cure or 
ameliorate the infection or corresponding condition caused by the fungus. An 
amount adequate to accomplish this is defined as a "therapeutically effective 
dose.' 1 Amounts effective for use with antibodies or antibody fragments will 

25 depend upon the severity of the condition and the general state of the subject, 
but generally range from about 0.01 to about 100 mg of antibody per dose, 
with dosages of from 0.1 to 50 mg and 1 to 10 mg per patient being more 
commonly used. Single or multiple administrations on a daily, weekly or 
monthly schedule can be carried out with dose levels and pattern being 

30 selected by the treating physician. 

In prophylactic applications, compositions containing the antibodies, 
fragments or peptides which bind to smORF polypeptides or a cocktail thereof 
are administered to a patient who is at risk of developing the disease state to 
enhance the patient's resistance. Such an amount is defined to be a 

35 "prophylactically effective dose." In this use, the precise amounts again 

depend upon the subject's state of health and general level of immunity, but 
generally range from 0.1 to 100 mg per dose, especially 1 to 10 mg per patient. 
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E. Diagnostic Methods 

The antibodies and antibody fragments and peptides that bind to 
smORF polypeptides can also be useful in diagnostic methods for diagnosing 
fungal infections. Methods of diagnosis can be performed in vitro using a 
5 cellular sample (e.g., blood sample, lymph node biopsy or tissue) from a 
patient and performing a histological analysis of the sample, or can be 
performed by in vivo imaging. These methods are readily known in the art. 

While the present invention has been described with specificity in 
accordance with certain of its preferred embodiments, the examples discussed 
10 herein serve only to illustrate the invention and are not intended to limit the 
same. 



F. Vaccines 

For smORFs identified using the methods described herein, the 
15 proteins encoded by these smORFs may be determined to be useful for the 
preparation of vaccines. Typically, proteins, or antigenic fragments thereof, 
are chosen based on their exposure on the surface of a virus, cell or organism, 
thus exposing them to the immune cells of a host. Additionally, these proteins 
and protein fragments must be antigenic or immunogenic (i.e. the ability of a 
20 substance to act as an antigen, which elicits a specific immune response when 
introduced into a host. 

The pharmaceutical compositions for use in obtaining an immune 
response would contain such pharmaceutical excipients, adjuvants and/or 
carriers as are standard in preparations designed to obtain an immune 
25 response. The therapeutic response would be one wherein the subject to which 
the pharmaceutical composition was administered would have a protective 
effect (i.e., preventing the subject from contracting an infection due to the 
microorganism for which the subject had been treated). 

(i) Selection oflmmunogen. Vaccines against fungal organisms are 
30 important to the treatment of a variety of diseases and conditions. For 

example, Cryptococcus neoformans is an opportunistic fungal pathogen which 
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causes an incurable, life-threatening meningoencephalitis in patient 
populations with AIDS. Coccidioidomycosis is another emerging health 
problem in light of the increasing numbers of immunosuppressed patients. 
Most infections are caused by Coccidioides immitis, which can advance into 
5 coccidioidal pneumonia or extrapulmonary infection. Thus, vaccines against 
these and other funguses is becoming more important, especially with 
increasing numbers of immune compromised individuals. 

Selection of immunogen can be based on one or more factors such as 
(1) cell surface exposure and availability of the protein to a host immune cell, 

10 (2) predicted antigenicity/immunogenicity of the immunogen, (3) whether the 
immunogen may be N- or O-linked glycosylated; and (4) an extracellular 
protein (e.g., proteinases, esterases and lipases). Certain glyocosylated 
proteins have served as good antigens in raising an immune response in 
animals such as MP98 of Cryptococcus neoformans in mice (Levitz et al., 

15 Proc. Natl Acad. Set USA 98: 10422-27, 2001); MP65 mannoprotein of 
Candida albicans (Antonio, Nippon Ishinkin Gakkai Zasshi 41: 219, 2000) 
and the cryptococcal capsular glucuronoxylomannan protected against 
systemic mycosis in mice (Devi, Vaccine 14: 1298, 1996). Heat shock 
proteins have also been identified as suitable candidates for antifungal 

20 vaccines (Deepe et al, J. Immunol. 167: 2219-26, 2001). 

(ii) Polypeptide and DNA Vaccinas. Antifungal vaccines can be 
prepared in a variety of ways. For purposes of this invention, living and non- 
living (i.e., derived from the entire microorganism) fungal vaccines are less 
preferred. More preferred are vaccine formulations that can be administered 
25 as (1) polypeptides, (2) polypeptides conjugated to another antigenic 
compound, (3) direct inoculation of plasmid DNA encoding the desired 
smORF, wherein expression is driven by a strong promoter capable of 
efficient activity in a variety of mammalian cell types. 

Once suitable immunogens are identified, protein based vaccines can 
30 prepared wherein one or more smORF polypeptides (20-500 jag polypeptide, 
more preferably about 50-150 jig ) are mixed with a pharmaceutical^ 
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acceptable adjuvant. If testing in animals, an injection is administered to the 
animal, followed by second and third injections a few weeks later. For 
example, 100 p,g of polypeptide (or combination of polypeptides) is admixed 
with a desired adjuvant (e.g., Ribi adjuvant , RIBI ImmunoChem Research 

5 Inc.). The material can be injected intramuscularly or subcutaneously in an 
animal subject. In mice, the protectiveness of the vaccine can be measured by 
footpad hypersensitivity testing. For instance, the peptide is prepared and 
injected into the hind footpads of the mice with either 50 \x\ of spherule-phase 
smORF polypeptide diluted in non-pyrogenic saline or in saline alone. 

10 Footpad thickness is then measured with a dual caliper and the results 

calculated as the difference in footpad thickness of antigen- and saline-injected 
pads at 18 to 25 hours minus the difference in footpad thickness of antigen- 
and saline injected pads before challenge. Lack of footpad sensitivity 
indicates that the mice have received some protective immunity with the 

15 injected antigen. 

Additional methods for preparing, using and assaying pharmaceutical 
compositions for inducing a protective immune response can be performed 
according to what is known in the art. See, for example S.H.E. Kaufmann, 
Concepts in Vaccine Development (Walter De Gruyter 1996); Devi, Vaccine 
20 14: 841-4 (1996); Deepe et al., J. Immunol. 167: 2219-26 (2001) and Levitz et 
al., Proc. Natl. Acad. Sci. USA 98: 10422-27 (2001). 

For purposes of conferring immunogenicity using a DNA vaccine, the 
plasmid containing and operably linked to the desired smORF would be 
administered, for example as follows. The desired smORF would be operably 

25 linked into a plasmid, such as pGEX-4-T3 (Pharmaceia Biotech, Piscataway, 
NJ) downstream from the gene encoding glutathione S-transferase (GST). The 
smORF containing plasmid is then amplified and preferably purified. The 
plasmid can then be immunized in mice or other suitable animal. If using 
mice, (for example in an assay system), the mice are injected with 200 ^1 of 

30 the smORF containing plasmid (100 p,g) or the plasmid alone (100 pig). The 
plasmid is in a mixture with saline and admixed with an equal volume of Ribi 
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adjuvant (RIBI ImmunoChem Research, Inc.) or other DNA vaccine suitable 
adjuvant. Additional components may be present such as synthetic trehalose 
dicorynomycolate (TDM) and cell wall skeleton. The DNA containing 
composition is typically administered intramuscularly or subcutaneously. 
5 Second or third injects can also be given via intramuscular or subcutaneous 
routes. The plasmid can also be administered intraperitoneally (i.p.)- See, 
e.g., Jiang et al. 9 "Genetic Vaccination against Coccidioides immitis'. 
Comparison of Vaccine Efficacy of Recombinant Antigen 2 and Antigen 2 
cDNA," Infection & Immun. 67: 630-5 (1999). 

10 In vivo assays of animals, such as mice, can be performed to determine 

the protectiveness of a particular smORF or smORFs or antigenic fragments 
thereof. Once animals have been injected with the smORF DNA, as discussed 
above, the animals can be challenged with exposure to the particular 
microorganism. Typically challenge is by intraperitoneal injection of the 

15 microorganism into the animal and assessment of survival of the mice with the 
vaccine as compared to control animals. See, e.g., Jiang et al, "Genetic 
Vaccination against Coccidioides immitis: Comparison of Vaccine Efficacy of 
Recombinant Antigen 2 and Antigen 2 cDNA," Infection & Immun. 67: 630-5 
(1999). Additional methods of preparing, administering, and assaying such 

20 compositions would be apparent to the artisan. See for example, 

"Development and Clinical Progress of DNA Vaccines: Paul-Ehrlich-Institut" 
in Developments in Biologicals vol. 104 (F. Brown et al., eds. S. Karger 
Publ., 2000); "DNA Vaccines: Methods and Protocols" in Methods in 
Molecular Medicine vol. 29 (Douglas B. Lowrie and Robert G. Whalen eds, 

25 Humana Press, 2000); Yvonne Paterson, Intracellular Bacterial Vaccine 

Vectors: Immunology. Cell Biology, and Genetics (Wiley-Liss, 1999); Bruce 
H. Nicholson, Synthetic Vaccines (Blackwell Science Inc. 1994); and Richard 
E. Isaacson, Recombinant DNA Vaccines (Marcel Dekker, 1992). 

30 All references discussed above are herein incorporated by reference in 

their entirety. 
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